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systems on silicon show a continuous inctease m con^lexity due to ftie ever 
mcreasmg need for implementing new features and improvements of existing 
fimctions. This is enabled by the incneasing density with which components can be 
mtegraled on an Integrated dicuit At the same time the dock speed at which cixcizits 
are operated tends to morease too. The higher dock speed in combination with the 
increased density of cotnponents has reduced the area which can operate 
synchronously within the same clock domain. This has created the need fer a modular 
^proach. Aocoidingto snch an approach the processing system comprises apUiraKty 
of relatively independent, complex modules. In conventional processing systems the 
systems modules usually commumoate to each other via abus. As the number of 
modules incteases however, this way of communication is no b^ger practical for the 
foUowing reasons. On the one hand the large number of modules fozms a too bus 
load. On the other hand the bus forms a conmmnication bottleneck as it enables only 
one device to send data to the bus. A communication network forms an effective way 
to overcome these disadvantages. The commum'cation network comprises a pluralfty 
of partly connected nodes. Messages fiom a module are redirected by the nodes to one 
or more other nodes. To that end the message comprises first inarmation indicativB 
fer the location of the addressed module(s) witihin the network. The message may 
further include second information indicative fbr a particular location within the 
"''^^ ^ 'l^ as a memory, or a register address. The second information may invote 
a particular response of the addressed module. 

It is an object of the invention to provide an integrated circuit and a method, 
according to the introductory paragraph, which provides the modules therem a 
relatively sirriple way of issuing messages, 

la order to adiieve said object the integrated drouit is characterized by the 
characterizing portion of claim L 

In the integrated drcuit accordmg to the invention modules canissue messages in a 
simple way, by using a single address. This makes it possible fer a module to perfenn 
a write action to apartLcularmemoiy afddress without bdng aware of the df^ns^fj^ 
whidi comprises said address is stiorecL 

fii this way the network appears to the model issuing the message as a bus. This 
makes it relatively simple to incorporate already existmg modules designed for a bus 
like drchrtecturB in an integrated cirouit according to the invention. 

As sudh, processing systems are known, where a processor is coupled via abus to 
various memoriejs, which each are m^ped onto a respective portion of the total 
address r^nge. By way of example a ROM and a RAM may be mapped to a first and a 
second address range re^ectively. When the processor petferms a read instruction, 
the address in the instruction defines at the same time which memory ia selected to 
read the data fedm. 
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Ib such lojwvmprocesang systems eadi of the various moMes. SM 
(Hiwtty coupled to Ihe 1«8, In the integmted circuit accoidiDg to the i^^ 

selecting one of the modules iaaplies that the one or other memories are set in a state 
wherein thqr do not interfere with llie bus traffic. Apart. fi»m the memory that is 
addressed no other module is lequh-ed to perfoma an action (in act, they don't have to 
and don't need to know that another module is active - i.e. fliey don't have to be- ?set 
in a state'X or 2) that multiple concurrent and/or pipelined messages can be active 
simultaneously in the nBtvwjrfc as a whola ih an integrated circuit according to the 
invention however, infiimaation issued by die active module is transferred as a 

message via one ormore nodes of the networic As a consequence it follows a • 
difibrant route through die networic depending on the address. This rout© is schetMed 
bydtenetvoik. 

Examples of the two pieces of in&nnaifion that are arranged as a shi^ address are! 
Single logical memory spaoeyinap/range mapped to multiple distributed memoncs 
eachwitb their own physical menwMy ranges. . ^ 

Virtual memory space mtwed to a single logical memoiy space (distributed or not). 
Multiple metnoiy spaces/maps/ranges mapped to multiple distributed memonra..m 
2) and 3) two translations may take place (vm ^ logical -> physical, and muKiplo -> 
single -> physical). 

Tlie integrated circuit of daim 3 and the method of claim 4 provide anojier w 
improving data transfer in an mtsgrated chradt comprising a phirahty of modules 

connected by a network. ' " " , . j/ ^ 

Theoreticidly a transaction ooiild comprise any number of ontgomg and/or return 

messages, in practice however a transaction is made up of one or two ontgomg , , 
messages (fiomdiefirstto the second moduleX and zero, one. or two reluin messages 
(fiomflie second to the first module). By managing the omg^gme^««im a 
diffirentfiomtheretummessagestheoveraU ^'^i^V^'^^?'^'^^^^ 
the integrated circuit comprising the networic is inqJtoved. This is flnl^ 

with the fbllowing embodimsnts. 

•With reference to clahn 5 it is ramariced that QT connections can overiiook resources 
in some oases. Pot example, when an ANIP opens a GT read connection, it must 
reserve slots fer the read command messages, and for the read data messages. The 
So between the two can be very large(ag.. 1:100). which leads eiUier to large slo^ 

tables, or bandwidth being wasted for the read commandmasa^. In o«er to . 
^t asmuch aspossible that areservation for guaranteed toffic worddimpede 

ShLtxansactionst^bandwidth^chcanbereservedshouldbeieetricteA^^ 

oAer hand thebestefiEbrttraffic may use any resources wMdi are currendyavail^^^^^^ 
As a consequence guaranteed tiafBo has bouudedbut on average hi^ btency than 
best-effiwt traffic whidi has no fixed igiper bound, but is (or should be) fester on 



SdSi this recognition it has been found diat the overaU quality of the network 
transport could be improved by exploiting BE packets for read command ^ 
messages, and GT packets for read data messages. No S^J^m canbeofife^ m 
SrcSrbuttheovLu throughput can be Mgher and more stable than hi the case^ 

using only BB packets. 
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With re^TBDce to claim 6 it is remaikcd that preferably the ongoing transactions are 
handled in a locally ordered and the return transactions in a globally ordered 
transaction mode. The one or more adressed modules process the transactions in the 
order they have been ^ssued, and the retiun part of the transactions are aH de^^ 
the first module in the order in which it initiated fee transactions. Even if ordered 
channels are used, ihe responses fiom different addressed modules (e.g., in a narrow 
cast connection) must be sorted at the first module. This kind of ordering confoxms 
withAMBA 

To implement global ordering transactions that are deliv^ed to different second 
modules (also xefezred to as slave) must be ordered exactly as fhey were sent by the 
first module (also tefecred to as master). This means that &e network should dther 
have a global time indicator, and use e.g. deadline-based scheduling in the network 
while in addition assumption on the consumption time of the second models must foe 
available. An alternatively way to introduce global oideiing is to introduce explicit 
dep end enc ie s between ttansactions. The latter can be done by using 
ackMwledged/Cagged transactions, where proof of delivery to the slave is sent bade to 
the master usmg an acknowledgement message. This soludoa, however, introduces 
extra latency because transactions are sequentlallsed with a round-trip delay/latency 
per transaction, (send a message^ wait for the acknowledgement, sendnext message, 
wait ftr next acknowledgement etc). By requiring only a local ordering for the 
delivery of the outgomg transactions, the ^ves, provided that they are autonomous^ 
(whxd! Is usually Ihe case) can execute messages independently. 

Wiifa reference to claim 7 it is remarked that &i this way buffer space Is nsed In 
an efficient way. A parttcnlar example Is an embodiment wherein a large buffer 
space Is reserved for the buffer of the network interface coupled to an active 
module, such as a module isning a read command) and a smaD buffer space is • 
reserved for the buffer of the network interface coupled to a passive module, &g, 
the one reedving the read message. 

In other situations there may be different types of flow control (e.g, you never want 
to lose write commands, but don't mind losing read data), ff a module can do both 
read ml write commands, it may be in^ortant that write transactions always succeed 
(6.g.. when writing to an mtern^ coniroUer), but fiiat read fnoisactiom 
because ihey can be retried (so fce CMD of the read transaction is dropped and the 
read never executed, or the RETDATA is dropped after Hie read has been executed. 
Another example is that if you know that writes always succeed if fhey are deliveoned, 
a flow-controlled connection is lequestedi Acknowledgements aro not necessary in 
that case; ^K^otit flow control acknowledgements axe compulsory, complicating the 
xnasi^ and causing additional trafSa 

Jsx die integrated circuit according to the invention the decision to drop messages 
not is not decided per transaction but fbr the outgoing and return parts of connection 
as a whole. For example all outgoing messages having the fbrmat reads+address or 
writes+addresgH-data) may be guaranteed lossless, while for all ' 
(whether read data, write acknowledgements) packets may be dropped. 

A connection could be opened as follows: 
conoid = open ( 
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outgoing imordered/local/globali 
ontgoixig buffer S3ze> 
return unoidered/local/glQbalj 
return buffer size); 

i,e, all outgoing messages have certain properties, and all return messages l^ave 
certain properties. 

Wi^ z^erence to claim 8 it is remaz!ced that in aprocessing system yfiSi modulis 
woddng asynchronously wiflx respect to each other it is usual that a module receiving 
data issues an acknowledge signal to inform the issuing processor that it ha3 received 
a message. In case Hiat a message is multicast a plurality of said acknowledge signals 
is generated, which imposes a burden for the issuing processor. In the inte^ated 
circuit of the invention the first module receives only a single message, which reduces 
this burden. This measure is based on the msigjit that die network usually can 
relatively easily generate flie single return message in response to the plurality of 
acknowledge messages of Ihe second modules as a side effect of the fiinedons abeady 



Witti reference to claim 9: Depending on the situation the single return message can 
depend on the acknowled messages hi various ways. The embodiment of claim 2 is 
favorable where the addressed second modules are memories, and the first modulo 
attenqrts to store data fh^ein. In that case it is suflBcient that only one copy of the data 
Is really received and stored. . - - - 

With reference to claun 10: In other situations it is compulsary that eadh of the 

addressed second modules has received flie data. In die embodiment of clahn 10 the 

smgle return message is not generated until this is the case. 

Olfaerwise the retumn message could be combined as Mows. 

IT each of the write transaction has been successfully executed by all slaves, all will 

return RETSTAT>»RET0K; which can be combined by 

the ANIP in a single message to be deKvered to the master. 

flje write transaction has been successfully ^ecuted only by some slaves, diere 
will be a mfac of RETSTATs (RETOK and RETBRROR). They can either be 

combined mto ^ ' 

(a) a sin^ RBTSTAT-RETERROR, to specify that an enor oocured, or 

(b) a single RETSTAT, but a larger one, more descriptive, encoding 
where thei« have been errors. AM RETSTATs can be bundled together . 
jnasingleRBTSTATforthemaster, or<slaveideDtifiers,ertOroodes> 
paiw can be bundled to form a single RETSTAT fbr Remaster. 

Ifflie connection has no flow control, messages can be dropped 

at the PNIPs, resulting also hi RETSTAT^RETLOST messages. Again, cotnbmstions 

as those above can be made. 
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With referenc e to claim 1 1 r Iq this way it is guaranteed tiiat the first mo dule always 
receives a response to a transaction, even if the connection has no flow control (Le. 
datainaybedropped)» This is done by only dropping data in fliePNIP (the networic 
interface coupled to the second, receivhag module), and returning a PAJLfERRGR to 
the ANIP (The network inter&ce coiqsled to the Srst module). This letom status 
(RBTSTAT) message will never he dropped because ffae ANIP that initiated the 
transaction will reserve space fer letqm messages of every transaction fliat it initiates. 
This combination of iieservmg space and generating an error message whenever a 
message is dropped is a way to introduce flow conAoL Preferably the RBTSTAT 
message is generated by the interface of ttie receiving module, alfhou^ alternatively 
it could be generated at the intermediary netwoilc nodes too. 
The method according to the invention pnarartfefia traniBwirfHft^ ^^ifltfnw, Le..it is 
always IcQown whether an initiated transactioh 

(a) was defivered and executed succesafiUIy at the slave O^STATHSIC produced by 
the slave), or 

(b) was never delivered at the slave (RETSTAT=»KBQL<>ST produced by the P>nPX 
or 

(c) was delivered at fb» slav^ but not successfiilly OKecuted (RBTSTAT^ERROR 
produced by the slave), or 

(d) was delivered and executed successfully at the slave but ttie response message was 
dropped (RBTSTAT^RBTLOST produced by the ANIP). 

This is achieved by either 

(i) not dropping messages (flow-controUed cannectionX in this case RBTSTAT is 
either OK or ERROR, or 

(ii) by allowing messages to be dropped (on a connection without flow control), but 
generating a ELBSTAT (ElBQLOST or RBTLOST) whenever the message is dropped^ 
or a RBTOK or RETERROR as usual when the menage is not dropped. 

It is essential however, never to drop RBTSTATs, because this completes lixe 
transaction.This is realized in that a hufier for the RETSTAT is located at the master's 
ANIP. The latter reserves space fi>r RBTSTATs when initiating transactions, and 
bounds the zrumber of outstanding transactions (for finite sized RBTSTAT buffers)^ 

The flow conlrol on the outgoing and retum connections is in principle independent 
Thus, for outgoing flow control & return flow control, fbo RETSTAT message is 
according to a) or c) above 

In case of outgoing flow control & no retum flow control, flie RBTSTAT message is 
a) or c) or do above. 

Li case of no outgoing flow control & retum flow control, the RBTSTAT message is 
a) orb) ore) above. 

Other embodimeofs are such an integrated circuit wherein the retum message is a 
message indicating whether fixe second module has received a message flrom the first 
module; In this embodiment the retum message can be v^ compact^ e.g.. one or two 
bits to indicate one of the &vr options described above. 
Alternatively or in addition a return message conquises an identificatton of the 
message received by the second module. 
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1 . I suggest "eflteletifiy" instead of '^petfom^nee". Vecauss perfonuaace is jnst one of the 
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ftetozs. Wemylmvefli8opti0]itoiediicetbeco8tofdienetwork(e.g.>r^ 
increase the pedbimaiiec (e,gn by addizig msct cazuificiions for tbfl same sesonzces). 
Page: 8 

2. This IS an example for tfao qse of difBarent propartiea for otitgcwng nnA K»tnm p^^rts. How&ver, 
isofB can be defined: 

V Aclmowledged write transaction! write command + outgoiDg data tise guaranteed throughput 

(mode one in your ^cample), and acknowledgment use$ best e&rt (mode two in yonr Gxaaxplei} 
Moreover, except ixme-ielated gnaxantees^ there is also a distinction on die buifering in bodi you and 
the dbore example. For dsta messages tiiere is potentially more bnffedng allocated than for commands 
and adcoowledgments. Consequently, fbr a read transaction (yonr exas^Ie) buffets for the xetumpart 
would be la:^ than those for tixe outgoing part. For the acknowledged write (the exan^e above), 
hofBsss fQt the outgoing part are l&fger, and Iho^ fox admowledgmBnfia are smaller. 

3. ' It is indeed possible to allocate different bandwid^ as yaa suggest H6wever, thete are also 
Ihnitatlons. We use a slot t^le^wlni^ contains a number ofslots in a tinije window. Bandwidth is 
reserved sweating fltese slots to comiBCtians, Porexanople^ifwevflea table wi& 100 slots for a time 
fiameof Ip5,eaciLslotwiUbea]Iocated£arl/100&omljiS«lQns. Ifthenetirarkpnivides IGfb/iiper 
liDl^tiiebandwidthperslotwillbel/100fiomlGb$^I0Mb/$, We cim only allocate nmltiple of 
lOMh/s fbr gnaxanteed througlyiui liufBe. 

For aread command generating Ippebmstg, allocating the naininiipnbandwiddioflOKIh/'g would be 
probably to much, as it will use only 11 803311 fiaetbn^xt ThebandwxdOioanimleedbeusedbybest* 
effort however, mtt by otoguaiantaedlbnmgl^utxia^ As a result; not all che traffic for 
TtAieb guarantees are needed tnay fit in tbe skst laUe. 

An alfemative is to use more slota» but ttis increases cost of Ibe router. Thisis94iypabestefibrc 
command may be a better sdution. 

• 4. TbisdedEhiitioin is good ibr outgoing messages, as tfaete is one source (AKIP) and potentially - - - — - 
muiq^de destinations (P>pP9). Bbwever, fbr retum messages, we definogkbaUIocalotdeikg as ' 
followa. Gtobal ordering means dsatiespanSe8fi»manPKE^slave8(ie.sou^ 
case) com in ^ same order as ^ tmnsactjens bavu been fnidatsd (i.e., the same order aa the 
cQOttamds have been issued the master to fteANIP). Looal ordering guarantees the order of 
restpoose aidy if tbey come fiom tlie same slave/miP. 

Slaye modules 

Page: S 

6. We can only guarantee the order we ofi&r transactions to file slave module, but the or^ 
processiDg depends on the module implementadon, ItcanweUdeoxdatoproceastmnsactionsina 
difTerent order (e.g., memory connroUer), For ordering we on!^ require tK^xo^prases are reoiznedi^ 
die same order as the transactions were accepted. 

Page: 8 

7. This is only valid ibr global orderix^ For local ordering (Le,, order piesoved only per slaved 
ifordered&ansport^bamielsBreusedynosortEngifineeessary* . 

Page; 8 

8. Global ordering ofpe^ponsesconfhficns with AMBA» Local ardraingofre$ponses does not 
Page: 8 

9. I ^[unkKefls meant write uBn^actions may be critical and we don't 

read tzan3actions can be lost^ because they can be tried later: See exainple below in the text 
Page: 8 

10. The two comsmds (i.e., read and ^vrite) can indeed be jSent irom the same module. If we set 
jxp a connection wi^ flow control fbr die outgoiog part both commands will be delivered. However, if 
^return part baa no flow comrol* the responses ibr read commands may be lost bsncH a ca9e, die 
readtransactionawillM. I fi^S^s meant read tzansaedons being los^ not read commands bethg 
lost 

Page: 8 

11. fc = flow control, nofc no flow control 
Page: 8 

12. Buffer is r83erved only for a retum status messagoi, such as an acknowledgment or an enor 
message. Bn£&r can be» but is not necessarily reserved also for returned data. 
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Pagei 8 

13. CtetacaaaJsobediDppfidatJieAlOTft^ 
&rtoietumpait Ihsuchacase, aJRETSTAT^RETI^STwfflteplro 
wJiichacconqwniedfliedroHJedlUn^ 

Page: 8 

14. Hasieserved 
Page: 8 

15. Yes, ftis Is taw. BetwBennmtBiB.flim is always IM 
lost Data can be lost CTiltjrfiii the netw^ 

aiB9ly3$flow(»nan)tOisim^ Thei^fort^here^niessagcfiwadititePl^ 
to-eg^iloweonteolignwiplemgnled, ^^nrn^MMui^ 



These and o4er aspects are desozibed in more detail in fhe following three annexes 

1. Coitnnimioaiion Services Jbr Networks on Chip, pag^ 1-25 by Aodiei 
R^ulescu and Kees Goossena; 

Farther bad^imd infonnalion usefid for in^Iementrng the invention can be jfound 
at: 

2. Netwoiks on Silicon: Blessing or Nightmare? pp US, by Panl Wielage and 
Kees Ooossen^^ ^tiblish£sd), and 

3. Trade-Oflfe in the Design of a Router with Combined Oaazanteed and Best- 
Effort Services for Networks on CJrip. pp 1-6, by Edwin Rijpkeoia. Kees Goossen% 
Andrei RSdulescu, Jef van Meerbergoo, and Paul Wielage, submitted to and reieeted 
byISSS2002. 
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NetyvoiSa on chq> (NoQ) iim xac^ed considerable att^tfon rfi<;ently 
as a sohnioii to the ineerconnect problem in higWy-coniplflK cbq>s I3-iS, 7- 
9, 15, 19, 22]. Tbe reaaon Js twofold, Pira^ NoCs help resolve flie eJectri- 
cal pitjbiems in new deep-suferaicron teishnolo^ea, as they stwctare and 
manage global wires [3-5,7, 8]. At the same dma they share wires, lower- 
iag their r^^^ and iucreaaing iheir utilizatzon [7, 8]. NoCs can also be 
emsy efBoient and reliable [4], and are scalable con^aiad to buses [9]. 
Second, NoCs also decouple coarputadon finm eommunlcatiQn, which is 
easential in managing the desiga of biUion-transifitor chips [14, 22J, NoCs 
achieve thia decoupling because they are traditionally designed using pro^ 
tocol Slacks [21], which provide well-^ned interfecos separating com- 
mntiication semce usage from service iii]plementation [5« 22]. 

Usmg networks fbronpchxp communication when designing syatems on 
chip (SoC), however raises a number of new issues that must be taken 
into account This isbecause» incantnstio eadsting on-ob^ interoonneets 
{e.g., buses, switches, or pointvto-point wivesX vrfiefe the commimicatmg 
modules are directly connected, in a NoC the modules communicate re- 
motfily via netwoik nodes. As a lesuK, interconnect arbitration changes 
ftom oeatralized to distributed, and issues like out-of order transactions, 
Istenwes. and end-to-end flow control must be handled eiflier by 
the intellectual property block (ip) or by die network itself. 

Most oethese topics have been already the suy e« of lesrardiin the field 
of computer networlcs |24] andpaiallal machine Interconnect netwoiia [6]. 
However, on-chip networks hacvediifereni properties {e.g., dghlwlink syft- 
chronlzacion) and constraints (e,fi., higher memory coat) leading to diffisr^ 
em design choices, which in the end affect die netsvojk services. 

In this pape^ we compare KoCs and off-cship networks showing boih 
their ^tii^iarWft g and diflEferenCes. We also ©iplore the differences between 
NoCa and existing oji-chip intercannects. We list new issues that must be 
resolved in ^9stem design due to themulti-hop nature ofNoCs, and present 
ah iiite!*ice which takes dxese issues into amaideradon, Our interfece are 
aimcdatbeingaimilar toaq)littrana8ctionbuftinter^ SBchas Va [Z5^ 
or OCP [17], tp allow simple, low-^cost wrajipers to bus intarfeces, and 
to allow backward eon^istihiBty wnb esdsting IPS. Our jnterface uses a 
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reqnesc-response protocol that provides basic read and write operations. 
But our interface axiesnds bus internees to fiUly exploit the power of our 
NoC [8, 19, 20], For example, it oSfers connectiott-based ccmsmmication 
where end-to^nd flaw control and time-xebued guarantees (e^., bounded 
latency) can be raqpiestBd. 

The p^per i$ oxgonized as fbllowa . In fte neact two secciood w compare 
NoCs piop^Bs witii those of off-chfp necwoifcs and bus^, respectively. 
fa Section IV, we present the services wa o£fer in onr netwotk. Finally, we 
present cor ocMichistens; 



. Netwoiks hove been the subject of r^eaioh fi» deoades, hoQx in Ifas 
contfiKE of local and wide area netmuics (eon^poternetwo^) [24]« and aa 
aninterconnect&r parallel manfaines [Q. Bofli ate veiymi^csb related to osh 
chip netwoilfiSy and xo^say of fiie results in those fields are also applicable 
on diip. However, NoCa premises are difeent firbm off-chip networks, 
and, tfafiiefbrey most of &e network design choices must be reevaluated 

NoCs difSsr from o^<hip networks niaiiily in dieir constzaints and syn- 
chronizadon. Topically, most on^ch^ resources have mnch tighter eon- 
straints compared to oflp-ohip. Storage (Le^ memoiy) and coxnputatiaii 
sottcces are relatively more expensive^ whereas the nmnber of poim-tt>* 
point Iznlss is larger on chip than off chip 17]. 

StoiBge la eacpensive:, because general-pmpose on-ch^p mcmoty, sudi as 
RAMs, occi^ a Iflige area. Having toe meraoiy distributed in the network 
componenta in relatively small sizes is even worse, as the of overhead area 
in the memoiy tizen becomes dominant 

Also computation for on-chip networks comes at a relatively faigli cost 
con^axed to off-chip networics. An off-chip netwodc inter&ce usually eon- 
tains a dedicated processor to implement tlie protocol stack up to neewak 
layer or enren highei; to offload the host processor fixmi the oommonicai. 
tion processing. Indludiqg n dedicated processor in a network intfiifkce is 
not feasible on di^ as sias of the netwoik imer&ce win becQ^ 
parable to orlaiger than tiie IP to be connected to die networlc Moreover, 
rumung the protocol stack on the itself may also be not feasible, be- 
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cause ofisn these IPs hava one dedicated functioa only, and do not hs^& 
die c^^ties Co run a netwoik protocol stack, 

The nmnber of wines and piss to connect n^oz^ oon^onestts i$ an 
order of magmfude laiger on chip than off chip [7]* Xf they ace not used 
massively for other pvip oses than NcO, Ussy allow "wide point«to-pdnt inp 
tarcoimeet$ (e.g.. 30D-bit links) [7, 15]. This is not possible off-ohip, wiiere 
links m relatively nanowen 8-16 bits. 

Oo-ohip wixBB are also xelaiively short; allovtong a much tighter syn^ 
ohzKmizfltion than oiFchipt This allows a redoctioii in the buffer space in 
lha routers because the comnumication can be done at a .smnHflr gcanu- 
lazi^. In the cunent semiconductor ieehn(4o^fi8» wiras are also &st and 
reliable, which allows smgHer Unk-Iayer protocols (e.g.» no need for er- 
ror correction, or retransmission). This also con^KSosates fbr the lack of 
memory and compinatiooal rosowoes. 

In ibo ze$t of tiie section, we li^ five network issues thai have a dirtot 
hnpact onHteNoC cost: teliabLe communication, deadlock, data ordering, 
nctwodc flow control and buifering strata, andtim&related gqanmtees. 
7ot eadi of fbsm, we discuss die differences and siinilathies fitf on- and 
o&ofai^ netwodss. 

Reli&hla commmiicatfoti. A consequence of the tight on-chip re^ 
source constraints is that the network camponentB (Le., zouiera and net> 
wodc intei&ces) must be &irly simiplc to TmnimiTfl compntatian andxnemr 
oiy Tequiremeots. Luckily, on-chip wioes provide a reliable oomnounicalioa 
medium, which avoids the consideraible overhead incuxred by the off-oh4> 
networics for providing reliable comnmnicatiorL Data integrity eaii be pro- 
vided at kyw cost at the data link layer. However data loss also depends 
on tha network architecture, as in most computer networks data is sim- 
ply dropped if congestion occurs in the network [6,24], On*chq), droppkig 
data may lead to a too eost]^ LmplementatiQn of rdiable communicaiiQn. 
We show b^Iow that a nenvork whereno data is droppodca;) lead to a much 
lower^at somtion, at the.peril of introducing the possibility of deadlock. 

Deadlocic Computer network topologies have generally an irregular 
(possibly dynamic) structure and bidirecnonal Hnks, wliich can inlxoduce 
buffer cycles. In such topologies^ packet dropping at tbo netwcvk nodes 
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may be sicquinsd tb avoid deadlodcs. 

Deadlock can also be avoided ivithout droppiog data, for exainple» by 
mtrodumg constiaiiits either in the topology or muting; Fat-tree lopolo- 
gies have aliead|y been considered forNoCs, where deadlock is avoided by 
boundiig back packet? in fte network in case of overfiow [9]. TP^as^ 
appmadiea id system design [7i 15,23] use mesh or torus network topolo- 
gies, where dfiadlook can be avoided usins; for example, a turn-model rout- 
ing allgodtiiizi [6]* 

An alternative sohilioa for deadlock in NoCs, wtd di takes xoto consider- 
siion that modules eotmectuig to the network aie ei&er masters (initiatlzig 
requests and receiving responses), or slaves (receiving requests and send- 
ing backTasponsfis)^ is to mahxtain sepsxate virtual netwodcs (with separate 
buffisrs) fi>r requests and lespoDses [6]. 

Data orderingi fn a zietwadc; data seat fcom a source to a d^tina- 
tion may axzive out of ordBr due to reordering in netwodc nodes, fbllowiog 
different routes^ or retransmission eHer dropping. For off^ohip netwodcB 
ouM)f«order dsxa, ddUv&y is typloaL However* ibr NoCs wliera no data Us 
dmpped^ data can be fon^ to fbUcw tbe same path between a source and 
a destination (deCenrnoistiQ routmg) no reordering. This in-order data 
transportation requires less buSbr space, and reordering mnAilea ate no 
longer necessary. 



Network flow control and boffering stratogji^ Network flow con- 
trol and buSerxc^ strategy have a direct in^tacc on the memory utHiza:- 
tion in the networic Womihole routing requires only a flit buiTer in flie 
router, whereas storo-ODd^fiiiward and virtuaKcut-Chrongh roucbg require 
at least die bu£&r ^aee lo accommodate a packet Cooseqoentb; on chip, 
wormhole routiz^ may be prefbored over virtual-cut-through or store^and- 
f orward routfaig. Similaify, input queuing wxy be a lower memory-co st air 
temative to virtual-OU^^Ut-queuing or output-queuing bufiering strategies 
because it has fewer queues. Dedicated fifo memory structures at a low cost 
also enable oa-chip usage of virtual-cut-through routing or victual output 
queuing for a better performance [19]. Howevo; uamg virtnalrcut-fhroagfa 
lontmg and viAuI outpm queuh^ at die same time is atiU too 
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intercoxmecr example intaKsoonfietexajs^le 



TiincrrolAtcd gnarantafis^ OfMiip necwoiks typically use packet 
switching and ofiTer best-effint services. Coaestse£aa. can occur at eadi net- 
wodctttdfiyiiiakiiiglatQacygaaimitees very offer, Tlixougfapttt guar- 
antees can atiH be offered using schemes such as rote-based switching [26] 
er deadline-based packet awitcbing but with high bu£Ebing costs. 

Aa alternative to provide sudi time^lated guarantees is to use time- 
division mnhipleflooess (TDMA) cimuita, where eveiy drcuiiis dedicated 
to a netwoiic connection. Circuits provide guarantees at a lelatiyely low 
memory and computation cost Network resource utilization is increased 
when the network aitbiteetate allows any left-over gnai^nteed bandwidth 
to be used ty be^t-efibft comnnizdcation [iPp 1%20]. 



HL nnombmestoNoGs 

Introducing networl^ O^igoro 1) as on-chip intercounect$ radically 
changes the communication when compared to direct interconnects; such 
as buses or sviritdies (Figure 2). This is because of the multi-hop nature 
of a network, \^^iere communication modulfls w not directly connected, 
but separated by one or more network nodes. This is in contrast widi the 
prevalent existing inxereonnects 0.e.« buses) where modules are directly 
coonacted. The isnplicatioi^a of this ehanga reside in iho atbitcadon. fodiioh 
must diange horn centralized to distributed), and in tfao comnnmicatioxi 
pK^iec^s (e.g*, ordedngy or flow contral).. 
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Ib tins sectiODi, we list sotne of ISiese topics, and outline Oie diffep- 
ences of NoCs and tases. We refer mainly to btzses OS direct int^ . 
nect?, because curten^ are tbe tnogt used on-cWp ijaercoimect Most 
of the bus characteristics also hold for other direct intenjonnects (e,g., 
switches [16]). Multilevel buses are a l^btid between buses and NoCa! 
D^ending on the functionality of the Imdges, fiir our purpose^ znoltilevel 
Inses either behave lilse sinaplebuaes [2] or UkeNoCk 
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^gramming MbdeL the pr^grainmiAg model of a bus typically 
consists of load and store operations which are nnplcmeated as a se- 
quence of pdnoitive bos txansactions. Bus Inter&ces ^ically have deiS- 
cated groups of wires fbr oonuuand, address, wite daia» and tead data [X, 
12,13,17,251. 

A bus Is 3 resource shared by multiple IPs. Hiexe^, before ustog % 
iPsnsQSt ^ thiougji an arbitratioa phases wheze diey request access to the 
bus, and block UAtil the bus is granted to ihem. 

A bus (ransactioD involves a request and possibly a ze^on^ Modules 
issuing requests are called masters, aiid those setvmg requescs are called 
slaves. If ^lere is a single arbitration for a pair of request^iesponse, the 
bus is called non-splii. In this case^ the bus remains alloeaied to the masier 
of the transaction until the response is delivered, even when this takes a 
long time. Alternatively, in a split bus, the bus is released after the request 
to allow transaeiions from difiEeranr masistB to be initiat&d. Hbwevo; a 

nei^ aibitiation nwst be pec^zmed fbr the ie$pon8e sttob iliat Qie ^ave can 
access the bus [11]. 

Fat both split and non^^plit bases» both communication paztfes hove di^ 
lect and immediate access to the stams of the transaction. In contrast, net- 
woik transactions are one-way transfers from an output buffer at the source 
to an input buffer at the destination diat causes some actionat the destina<- 
tion, the occunence of which is not visible at the source [Q. Ihe effects of ' 
a necwotk transaction are observable only du»u^ juiHfHnTigi tzansaetions. 
A request-response of opexatioii is stin possible, but teqiiin^ 
two distinct network tiansactions; Thus, a bns-Iibe transactioii m ft KoC 
win essentially be ft split DHDsactlon. 
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l^ansacfion Ovderingi tSradhidttal]^, on a Iw all transactions^ a» 
ordered (df. Perq)heral Va [25], AMBA [I], or CoreConnect PLB and 
OPB [li 13]). Tliis is possible at a low cost* because die interconnect^ 
being a dirtjct Jkk between the ccuwroicatiag paities, does not reorder 
of data- However, on a split bus, a total ordering of traaBactions on a sin- 
gle master ms/ still cause pedi:^?niafloe penaliies, Y/tesi slaves fespond at 
di£Eerent speeds, Ti> solve tins problem, lecent ejctensions to bos protocols 
allow transactions to be perfbnned on connectionfi. Ordering of transac- 
tions within a comiediott is sdll preserved, but between cooonections there 
are no ordering constraints (eg, OCP [17J, or Basic VCI 125]). A fcw of 
the bus protocols allow out-of-order responses per connection in their ad- 
vanced modes (e.g„ Ajdvajiced VCI but botfi requests and responses 
arxxve at the destination in 0w same order as ^ were sent. 

In a NoC, ordering becomes weakeL Global ordering can only be pro- 
vided at a very iilgb cost due to &e eonfiifit beiweeatbe distribute 
of the networks, and the zequiEemeat of a eentralised axbibatieii necessary 
for global ordering. 

Even local ordering, beiweeia a soutce-desdnation pair, may be co^y. 
Data may anive out of order if it Is transported over mult^le routes. In 
such cases, to stiU aiihievB an in-<order dBlfv^» data must be labeled with 
aai|i)enee numbers and reordeved at the destinaiion before bemg delivered. 

Atomic rfcftfag of Transactions. An atomic chain of transactions is 

a sequence of iransactiffiis initiamd by a smgile master diat is e^ 
a sin^e slave exclusively- That is, other masters are denied access to tbat 
dsve, once the first transacdon hi the chain dahned it. This mechanism is 
Widely used to in^laments syndhroniasaxion meCbani^ms bescween masief 
modules (e.gi, semaphores). 

On a tttis, atomic operations can easily be implemented, as the oentral 
arbiter wiU either (a) lock the bus for exclusive use by the master request- 
ing the atomic chahu or (b) know not to grant acce^ to a locked slave, 
hi die former caseir the time resoiHces are looked is shorter because onc^ 
A master has been gmnted access to a bus. it can quicldy perform aU the 
frflflp^^^ft"? ^ thft fihaia(no arbittat&m detoy is required for die subsequent 
transactions in die chain). Ooascqucmly, d>e locked slave and die bus can 
be opened!^ agam in a short thne. This wpioach is usedin AMBA, and 
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CorcCbniiect In fte laiiBrcaso^ die bus is iwt Iw 

by oflier modules. Imwver, at tbe price of a longer loekittg time of tlie 

slave. Thjs ^proached is used in VCI and OCP. 

la a NoC, where the aibitradfla h distributed, masters do not know that 
a slave is locked Therefore, transactions to a locked slaved wwy still be 
initiated, even thoi^ the loeked slave cannot acc^ tbem. Conpeqaentay, 
to prevent dewDock, ttoe oOor transactions mm be eidicr dropped, or 
stored sodi that tiansacdons in the ai»mic chain <an ^ 
served Wfareovmi the time a module is locked is nmch lon^ 
NoCs, because of the higher laten^per nansaetiim. 

Deadlock. Kt»^^ buses, the deadlocka are not generally an issue. Dead- 
lock can still occur at Oe ^RpUoatioxi level (B.g_, an atomic chain of tians- 

aetbns that looks the busp wMeh is never finished!), but it is not caused by 
fte interconnect {t9elf» 

\sx a networic, deadlock becomes a more in^ortam issue, special 
care has to be taken in the netwoik design to avoid deadlock. Deadlock is 
mahily caused by cycles in die baflfers. To avoid deadlock^ eidier netwodc 
nodes must drop packets when (heir boffcr are filled, or routing must be 
OFde-ftee. hi a Nod we believe latter is pre&rabl^ because of its 
lower cost inaGhievixig leliablB comnmnicatian (see Section O). 

A second cause of dbadiock are atomic chains of transactions, the rea- 
ponis thai wMle anaodule is loCteBd, the queues storing transacdoas may 
get filled with transactions outside the atomic iwnsactiop chain, blockips 
the access, of the transaction in the chfdn to reach the loosed module. If 
atomic transaction chains must be implemented (to be con^adble with 
processors allowmg this» such as MIPS), the network nodes should be able 
to filter the transactions m the atomic cbaii^ or be allowed to drcp those 
bifickinf them. 

Media Arbitratioa* An unp^tant dif&xence between buses and 
NoCs is in the media arfoitratiQn sdieme. hi a bus, master modules re- • 
qitest access to the zntemonnecii and die axfaitBr grants the access for the 
whole hitecconnect at onccw .A^I^TOico is <mfro^^ 

arbiter con^oneo^ and g/bdo/ as all te requests as well as the state of thfi 
inmroonneec ate visible m the arbften Mbzc^^ 
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8 AOn-<split bus, arbitration takes place once when a craTtsacticin is 
liiitiatedL As a zestxlt^ the bus is gtmxted for both request and response. In a 
^lit bus, requests and respozjses are arbitrated aeparately. 

In aKoC arbitration is also necessary, as it ^ a shared interconnect 
However, in contrast to buses, Oe arbitration is dUitlbuted, because it ia 
perfozzned in every router, and is based only on local infons^ozi, Aifai* 
fiction of the communication resources (links, bu£[eis) is pei£umed inoe- 
m&ltaSfy $s the xeqiaest or response advances [19]. 

Deatinfltion Name and Routing. For a busi fte command* address, 
and data axe broadcasted on fteinteEconnect. They arrive at every desona- 



tlie requested command This Is possible because all modules are directly 
connected lo tbe same bus. 

IbaNoC, it is not feasible 10 broadcast ixafbnnatxon to aUdestiii^ 
because it must be copied to all routers aiul network interfaces. This floods 
the network ivith "Hie address is better decoded St ^ source to find a 
route to the destinatioii module. A transaction ad4ress wiU aerB:foTe have 
two parts: (a) a destinalion identifier, and (b) an internal address at&e 

Latency. Iten^action latency ia caused by two Actors: (a) the access 
txms to the bus, whidi is the time until the bus is granted and <b) the 



For a bus, whore the arbxffstion is Gentralized tilui access dme is lin>* 
portional lo tiie number of masters conneeoed ro the bus. The transfer la^ 
tency itself ^ically is constant and relative fist, because the modules 
W linked directly. However, the speed of transfer i« limited by tite bus 
speed, whidi is relatively slow ibr buses. 

In a NoC, arbioration is performed at each router for the ^Uowiflig link. 
lliB access time per router is smaU. Both end'to-eud access t^ 
port time increase pcopottionaUy to the number of hops becweea master 
and slave. Hkrarevei; network links ana imidirectksnal and point to point, 
and hence can nm at higher frequendes tiian bnse% dnis 
tency. 



08.10.2002 



V*PHNL021031EI>P 026. 08.10.2002 18:10:39 



08.10^002 

From a latency prospective Qslttg a bt]fi or a 
twieen tlje nombfir of modules ocnmeote^ 

access timeX the speed of to jntereosDeo^ and the network didm^en 

Data J6V>mat la tmst modem bos inteiftces fbo data fbinuit is ^ 
fined by sepazatB wii« ^oiq)B fbr the tansaetio^ 
read flikl tiotttm acknowrtec^ 

COTeComaot). TWa is iis6d to plpelfna fjansactions. For eocansple, ccnpuiw 
reDtly wilii sending tiie address of a read tran^ction, the dm 
write transactioi) can be sei^ aid the daia fiom an even earlier r^ 
action omi be received. Moreover, baviiig dedieated wire groups simplifies 
the transaction decoding; there 19 no need ibr a mecJumism to selM be- 
tween difibzient Idndjt of data sent ofver a conmien set of w 

. Inside a netwozl^ there is topically no distxaotion between diSercat 
Idnds of 6st3L Data is treated luuibcoily, and passed ^xna one router to 
another. This is done to mfniniige the control overhead and bu^^biix^ in 
routers. If separate wires would be used for each of &e ^vo-mentioned 
groins, separate routing scheduling^ and qneuing would be needed h^ 
^easmg tiie co^t of rosters. 

In addition, in a network at each layer in the protocol smc^ oontroi lit. 
fonnatum nonst be si^Ked together with the data (ag*. conmund 
addiesBj or data si»). This eonirol iufbrniatlon Is OEgsaized as an eny^^ 
around iho data. That 13. &st a header is ^ followed by the actual data 
^loatQ. foUowed possibfy ly a trailei: Multiple such envelopes nwy be 
provided £3r (he saniie dataj each caiiying the eooespondmg control infb^ 
matian for each l^er in the network protocol stack [d^24j. 

Buffering and Flow ContnL BuSering chia of a master (omput 
bufifedng) is used both for buses and NoCs to decouple computation fxom 
co T T unnn i c a t ion, However, ibr NoCs output bnfiezing is also needed to 
nsarshal data, which consists of (a) (optionally} splitting flie outgoing data 

in smaller pa^cets which are ttanspcrted by fha- netwodc^- and (b)- g/frftng 

connrol information for the network around the datft (packet header), lb 
avoid output bufOsr overflow the master nmst not initiate transactions that 
generate more data than tiie euirently available fipace. 

Similarly to output tniffering^ izqmt fau£B^^ 
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cosopvtation fiom coinmuptolioii. Iii aNoC. aoinit buff^iog is also re- 
quired to inmiai^bal data. 

In additioo^ flow contml for inpot bufTeis difSois for Imses and NoCS. 
For 1>i3ses, thtf soi3ice and destinatimi are direcUy linked, and^ destifiation 
can tfaexe^re signal directLy to a source that it cannot accept data. This 
^imaUr m cnn gvm be avBiIflble to tfae arbiter, such that flie bus is not 
gnsnted to a traoaaGtion trynig to write to a fuU btt 

In aKoC. however, the de$cinati<m of a transaction cannot signal dv^ 
reedy to a somce tliat its mpat bq£fer i$ fidL Conseqaenli^* tcansactions 
to a destination can be started, possibly fiom mtxltq}le source^ after the 
destmatiQtt*3 input InifiFer baa filled up. Two policies can be adopted when 
an input buffer is fblL The first is not to accept additional incomii^ t^ 
tion$p and to stoie them in the network. However, this approadi can easily 
lead to notwoik coqgestioat aa the data could be eventoaUy stored all the 
way 10 the Bouma, bloeldog the links in between. The second ^loaoh is 
to accept incoming tmn^actions at a fiill destnialxon, and drop some data 
in the inpirt buff^. Con^'on to asvoided bwt data is lost ^ 
sirable. 

To avoid output bof!br overflow connections can bo used, together tyith 
end-^b-end flow control. At connection set up between a master and one 
.or more, slaves, bafSsc space is allocated at the netwoiic inici^ces of the 
slaves* andthe netwoik inter^ of the master is assigned credits refecting 
the amount of buffer spaoe at the sUves. The master can only send ^ 

- " it has enough credhafbr the deslinaiioxtslave(s).Tlw _ - 

credits to the master when they consume data. 



As described in the pievious two sections, NoCs have different prop- 
eities from both existing off-chip networks and esasting on-chip inter* 
connectSr-As a result, cadsting protocols and service interfecea cannot be 
adopted direcfly to NoCs, but must take the characteristics of NoCs into 
acoount Ptor eaomple, a laotocol smsh as TCP/IP assranes the network is 
lossy, and inelndes significant eomplexj^ 1o provide reliable commijnica- 
tlon. Therefore, it is not suitable hi a NoC where we assume data transfer 



IV. The iSthcreal Approach 
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leliaMiiiy is already solvod at & lower lev^ On tho other hands &u5tii)g 
oiwsfaip piotocote such as VC^ OCP, AMBA, or CoreConnficc am aj«> not 
directly ^licahte. For exdn^le, tiiey assume ordered transport of data: 
if two requests oit irotiatcd ftom die same master they will anive in iha 
same otder at the destination. This does not hold automadcalfy fi» NoCs. 
Atomic chains of transactions and ond-to-eiid flow control also aaed $p9- 
eial attentiaa in aNoC intetfiiffi, 

Our objectives when dafinimg oar networic services m fhe following. 
First, the services abstract fiom die network internals as much as possible* 
This is a ingrediem in tackling the challenge of decoupling the com^ 
potation from ammunicatUm [14,22], which allows IPs (the computation 
part), and the Interconnect (die commmdcationpaii) to be designed inde* 
pendeatly fiom each othMv As a cnnsfui^ienoe, our services argpoattioaed 
at die transport layer in die XSO-OSI reference model [24]^ which is the 
first layer to be ind^iendont pf die implementacion of the network. 

Second, wo aim at a NoC inter&ce as close as possible to a bus inter- 
face. NoCs can then be inirodaced non-disiqptivety: with minor dianges, 
existing IPS, methodobgies and tools can continue to be used. As a conse- 
qacnce» we use a request-response interihoe, similar to inter&ces for s»lit 
buses [1,12,13, 17,25]. 

Third, onr inter&ce exreods traditional bos inca&cea to fsHSty ocploit 
die powtt ofNoOs. For exaa^ we connection-based communica- 
lion which does noconly relax ordering constraints (as for buses), but also 
enablas new commmdcadon prpparttesi such as end-to-end fiow control 
based on oredixs, or guaranteed dnoughput [8, 19,20]. All Uiese properties 
can be set ^br each connecdon individaaUy. 



A« The ^£thereal Connection and Transaction Model 

IPS interact with our network [8, 1 9, 20] at so-called network mtec&ees 
(Ni). Nis provide NI ports (to>) throu^^ \i1iich die communication services 
axe accessed. As shewn in FigDXB a, a HI can have sevexal NIPS to which one 
or snoro tPS (co input atiqn etementa or memoriea, but not interconnectian 
elements) can be oormectad^ Slmilails^ an i? can be cocmected to more than 
one Nls and NIPS.. 
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IP IP 



Ffgare^ Exan^lesoflinlra between nib and ips. 



Comxnumcation between nips is poi&nned oa conneetums. Cazmeo- 
tiofis ax9 inlxoduced to describe and xdendfy communleanon with difieient 
properties, sucb as guaranteed ihron^iputi bounded latency and jitter, or- 
dmd delivery* or flow conirDl. For exampld, to distingmsh and it^depan- 
desOy guaxantee conunnmcatian of iMbs and 2SSs^t two connections 
can be Dsed. Tm Vtfs can be connected by multiple connections^ possi- 
bly witii difieient properties. Connections as deigned bere era simflar to the 
concept of tfareada and coimectlans ihnn OCP and VCL Wherein OOP and 
VCI connections are lued only to relax transaction ordering, we generalise 
only the ordering property to indade configiuation of bnffering and 
flow control, guaranteed througl^ut^ and bounded latency per connection. 

i£thereal connections most be creoted with the desired {ooperties before 
being used. Tb^ msy result ia mourn resetvatians insido thd network 
(e.g., httSEer spac^ or percentage of the Unk usage per time unit). If the 
requested icisomces are not available^ the nemkiic will reflxsa the regpest 
After usagep connections are olosedt which leads to fi»eii^ the xesouroes 
occupied by that connection. 

Tb allow mom flexibility in con^gurmg connections, and, hence, better 
resource allocation per connection, the ontgomg and return parts of con- 
nections are eonflgured sepsxalely. Por example^ difiTerent buSer space 
he allocated in fto AKIP and PKIts, lefq^eei^vetyv or differem faandwidtfas 
can be reserved ibr requests and responses. 

Depending on tiie requested $ervices, ^ iini6 to handle a connec- 
tion (Le., creatrngr closing, roodiQang services) can be short (a.g., creat- 
ing/'closiz^ an unozdered, lossy, best-effort connection) or significant (e.g^ 
creatbig/dosjng a multicast guaianteed-tooughput connection). Conse- 
quemly, connections are assumed to be created, ^osed, or modified inite- 
quently, coinciding e,gr with reconfiguration points^ whoa the application 
requireinents change* 

Communication takes place on connections using tnotsacHcnt cmaist- 
ing of a reqnest and a possibly response, The request encodes an q}Bnitioa 
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OUTPATH CMP ^ 
RETDATA HETSTAT 

i^lSHV^ iVansaciioa con^osittoa. 

(&g^ read, ^vrite, floAflest and se^ sop) and possibly carries oulgmog 
data (e.g., ibr write oonasmds). The response tetaw data aa a result of a 
coiinnand(e.g.,iead)aiibraii&ckiiovvIedgmfiat . . 

nnimftgtfq M fnvohlftBat least tssrn r^PfL 7Vanfia<irimtg ^ a rtmmiOD 

are always stanad jil QBsnd 02^7 erne of ttae NIPS, called tte 
dui^tf mp (ANIP). Afldta otiier KIPS of tha comiecti<m 

Vasre can be imlt^lEaiudctioiis acttve on a connection at a time (as 
fbr ^lit bn^es). That iBiVansactions can bo started at the ANIP of a con^ 
tion while responses firaarlier txansactfons are pending. If a connection 
has multqdo slaves^ luiiijpfr feapsa^na can be initiated tgwarda different 
alavea. Ttansactjons aaalBO pfpeTfnad between a single pair of a nastar 
azul a slave for bodi leqira aivl zesponses. la prindpleii 
also bopipeSnedwiddBadave^iftiie slave alkiws^ 
A tgansactten ia nniBHiftii d ftam ifae fc^ow^^g massages (see Figure 4); 

• AeovimairtfflHaB4ge(c^)isseQtbydidA>nP,8iuidescra^ 
actiion to be esGBBrted at tha slave connected to the PNip. ^camples 
of commandsKiBfld^ write, test and set; and flush. Commands 
ara the only onsages that are con^ulsoiy in a transaction. For 
NIPS that aUoarqdy a sLagle conunand with no paiameiers (e.g.^ 
£xed-8i2se addns-lBSS writa), we assume the conunand massage 
stin <cds(a> eveiff it is implicit Q.e.» not expHcitly sent by the IP). 

• An01tfdSsea]lC8BllgB(OUTDATA)iSSeDtl^fhfiANSP fol^^ 

conamsdtfaatB9iire8d^tDbeeace<mted(d^^wxIte,mttltic^ 
and tast-and-sa^ 

• A7eA»?ii!(ite89BB8age(RETDATA)is5eQtbyapmPasacan^ 
qi^nce of a insKdon execitfioa th^ produces data . ^ b . _ 
and test-and-sa^ 

• A£^G9^/^OTiadatnv/bfj^snrmessag6(RSTSTAT)isanc^ 
mess^ whichii xetumed by PKU* when a command has been 
completed. It m^p signal eidier a successful con^letion oraner- 



r.cna^ox 



o.uu..-c,o«^ lo.xx mxuxro uxr -ox ^f^™ 08.10.2a02. 18:11:4C 



FHmX)21031BPP 



08,10^002 



14 RadQle$enMidGoo»enfi 




Figure B. Examples oflinks between Nis and IFS. 

ConumipieatiQn between nips is ^ erfisnned on connections. Coanso* 
tbds are intiDdaced to deactibd and xdeniiQf eoinmundeaiioa witiidifiarent 
propeities, such as giiaranteed ^ttg^ipu^ bowled latency and jitter, or- 
deliveryv or flow coniml. For example, to (UstingmsU and indepen- 
dently guaitttitee comsmnication of iMbs and 25Mbs, two connections 
can be DSed IWo WU*s can be cormected by multiple conneetionsy possi- 
bly witii difieient properties. Connections as defined here ere $nnilar to the 
eoncept of tbxeada and conaections ixoni OOP and VCL Where in OCP and 
VCI connefitioQS are used only to relax transaction ordering we generalize 
from only the ordering p r o perty to indude oonfiguiation of bnfferiz^g and 
flow control, guaranteed dixougbput; and bonnded latency per coaneotion. 

i£therddl connections most be created the desired properties before 
being used. This may result La resource reservations inside the network 
(e.g., bu£fer space, or pmeotage of the Unk usage per time unit). If the 
requested resomces are not available^ the netwoiit will xefUse the request 
After usages connections are closed^ which leads to fteeing the resources 



To allow mom flexibOily In conHguring eonncetion^i and, henee, better 
resource allocation per cennecdon, die outgoing and return parts of con- 
nections are configured separately. For example^ di£forent baS^ spacA can 
be allocated in die ANIP and PKIPs, rsspeetively, or difTexens bandwidths 
can be reserved for requests and resiponses. 

Depending on the reqioested services, the time to handle a connec- 
tion (Le-, ereatni& oloaingi modl^ing services) can be short (a.g., creatj- 
in^olosing anunozdeisd, lossy, best-efTort connection) or significant (cg^ 
creating/closing a nmttlcast guaranteed-throu£(hput conneetlon). Cons&< 
quemly» connecdons ate assumed to be ereaeadp olose4 cr modided infre- 
quently, comciding o.g. with rBconflguration points, when die appHcation 
pequirements change. 

Cotnmunicatian takes place on connections using transaction, consist- 
ing ct^jeqaiBBt and a possibly response, Hie request encodes an operation 



\PHNL021031BPP .030 08.10.2002 18:12:00 



COMMUNICAnONSBftVICESroftNOCS 

ANIP Z_ 7 ^ PNIP 

RerCATA ^ETSTAT 

i%iiiv4L l^ansaetian cainiio$2tEOiL 

(e.g., read» wnte, ita^ test and se^ sop) and possibly oanies outgone 
dala (e.g., fbr write commaiuis). The xesponse letnns data as a i^ult of a 
coonnand (6.g., resd) and/or an acknovtiedgmMt 

Connections involves at lea$c tvvo ntp?- 'ibansacdoAd on a connection 
axe always staned at one and on)y one of the KIPS, called the coi^ 

^^lp (ANIP). Ail dsa other NiPa of the eoaiiee(fa& are called^^ 

Vaasre can be anilfiple transactions aet^ on a connection at a time (as 
fbr 6|p]it buses), lliat is, mnsactions can be started atfbe anip of a cosnee- 
tion whxld responses fbr earlier tiaasaotioss are pending. If a cqnneetion 
has miiltQde slavesj multiple tiansactions can be initiated towards difibrent 
slaves. TVansactions are also pipelined h^ween a single pair of a xaaster 
and a slave fbr both reqnests and xesponses. Ja principlBi oaosaetlona ean 
also be pipelined within a slave, if tiie slave allows this. 

A transaction is ooimposed ficom the fidlowiqg messages (see Figure 4): 

• A<;077iman^mes$age(CMD)issentbytheANiP»anddescrn>6Sth^ 
action to be executed at the slave connected to the fnip. Bscaniples 
of commands axe xea^ wxft^ test ajtd sel; and flusL Commands 
ate the only messages that are eoinpulsoxy in a transaction. For 
HIPS that aUow caily a single oonmand with no psramecBcs (e.g,» 
fixedosisse address-less wriCdX we assume the command message 
stiU exists, even if it 13 implidt 0.c.» not expUdtly sem by the IP> 

> An our laSousa message (outdata) is sent by the ANi? following a 
command that requlies data to be executed (e^^ write, multicast^ 
aadtest-snd-^et}. 

• A return data message (retdata) is sent by a PNtP as a conse^ 
quenoe of a ttsnsaction eieeattion tfaatpndueea date (e,g^ read; 
and test-and^). 

• A ca;i9;/etf0nodbiOK^£3Renf message (rbtsta^ 

message which is returned by PN}^ when a command has been 
completed. It may signal eidier a successful completion or an er- 
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READ 

/^p PATA » wmTE > PNrP ANIP ZHZ PNIP 



DAT* 



unacl9iowlBdsed write 



^ _ PNW» 

ANIP T*" — ^ PNIP ANIP 



OK/ERR 
BQknwilsdIsedwrtn 

i^nrtt^ IVaxiBactbxiesBiQplfis, 




PNIP 



ran For taumtsSim mclndine both KBTdata and rbtstaT the 
two mc^aages can be combined in a single message for ^Skioncy, 
Howevisr, coxxceptually, tiiey exist bodx: RET3tat to signal ^ 
pte^eoce of data or on etrot, and retdata. to catty ^ data. In 
bus^ ed intec&ce? &3TPATa and retstat typically exist as two 
separate gnals [1, 12, 13. 17.25]. 

Messages coa:q>osfng a tranaacdom are divided in oiagoing message?, 
namely cmd and outdata, and response messa^ges, namely rbtdata, 
RETSTAT. "Wfbaa a tzaosaction. CMD recedes all otiier snass^ga^^ and 
itBTDAlA pzecedes ret^tat if pzesent These nilea ^ly both between 
xnastra and anxp» and PNrP and slave. Examples of tiansaeticuis ate shorn 
in Figure S. 

We classify connections as follows (see Figtore 6); . • . . . . . 

• A connecdoa is a connection between Odu ANi? and one 
PNIP, 

• A namwcast oonneotion is a connection between one a>CIF and 
ona or more fnipSj in which the a^ip initiates transactions that 
axe exeoaced by exactly one pnip. An exaniple of the nanow- 
cast connection is shown in Figure 7, wheie the ANIP performs 
transaotlonB on an addms ^paco vidixdi is mqipedoa two mamr 
oxy modnles, Depending on the tzansaetioa addfess^ a transaction 
is exacitfed on only one of these two memozias. 

• A multicast connection is a connection between one anip and 
one or more PNiPs^ in which the sent messages are dnpUcatad and 
each PNIP receives a copy of those mess^gesi In a multicast coo- 
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6tmp!s: ANIP < ■ ■ -» PNIP 




cooascdon. 

CoDttfiCtiQix'^ypes. 

oection no return messages are cunenily allowedi because of the 
Ujgp izafSc th^ gpasrato (lo^ one response per <!estuiation). It 
could also increase tiie eomplexit)f in the aku* because lodMdiial 
fBsponses ftocd IPif IPs innst be meiged Into a single response far 
ifae AMiF. This requires bufite space and/or ftrfMfmal consputar 
tionfAfCheniasEngils^. 



B> Conncctfo)! FmpertEes 

In section wfi descnbe the proiiotties tiiat oan be eoiii^gured for 
4 connection: guaranteed message int^gnty, guaranteed transaction com- 
pletlon, various transactioii orderii^ goaiuueed tbiong^^ 
tency and Jinm; and conneotian flow contzol. 

Data Integri^r, Data integrity means thar the payload of the messag? 
is not changed (accidentally or not) during transport. We assume diat data 
integrttyi^ aJresdy solved at a lower layer in our nerwoik, namely at the 
Ifnk layer, because in cmrent oi»-chiD tedamloglfis d^ etan he frytgiftTTgff 
uacoraipted over links. Conseguentlv, om network interfece Ahuaya gi«m . 
antees^riiessi^azedeliverBduncomiptadaifbedestinatto . 

llkraitsacfion Conipletlim. A tFansactioii without a response is said to 
be conqilete when it has been executed by the slave. As there is no response 
message to the master, no guarantee regarding transaction oonapletion can 
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.FSg^raA Message ordering i$dbservable at a, b.e^ and d. 
be given. 

AtroDsaction with a te^ponse is said to be complete \viie& a RStStat 
message is xeeeived fiom Oe xhip^ Hie txansaction may either (a) be 
executed successfully, in which pase a success rbtstat is returned, (b) 
fail in ll^ execution at &e slave, and then an execution error uststat is 
iecuiiied,or(c) fefl because ofbu^er overflow inaeomieclionwifli no fl<nv 
eontrol, and then it rcpOTts an overflow ecm 

In our networkp rooters do nor drop data [20!J« fterefoie^ messages axe 
always guazanteed to be deliveted at the Ni. For conneotions with flow 
control, also WIS do not dr(>p data, lluis, mesrage deliveiy to the iPa is 
goaianteed asnomancal^ in this case. 

However^ if tiiere is no flow control, messages may be dropped at fhe 
network interne in case of buffer overflow (see the paiagR^ on end-t^ 
end flow c<HrtiQl below). All of CMD, outdata* and.RETPATA nay be 
dropped at dike Nl. lb guarantee transaction completion, retstaT is pot 
allowed to be dropped. Conseqnedfly, in the ANlPs enough buifer space 
must be provided to aceommodBte rststat messages for all outstand- 
ing transactions. This is enforced by bounding the Rumbo' of outstanding 



'nransactien Ordering. In this section, we desoribo ^ ordering re- 
quirements between di^rent transacdons wxtbin a single cormection. Over 
different connections no oidenng of transactions is defined at the transport 
layen 



■We Bssums'tiiat when daza is lecdvtd 119 a PoapoosB (RBn>ATA), a asTStur (possStify 
ifi^fidt) la alsQ Topcivcd to wllflale die daUL 
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HuTO seveial points in a connectfon wbiixie order of trsmactiQii^ caa be 
observed (see Figure 8): (a) the or4er in T^iicfa the master presents CM0 
messes to the akip, (b) the oider in \?]iich As CMDs aro delivexed to the 
aim ^ tiie PKIP, (o) the order in vAAsh the slave piesenis the re^pOQ$es 
to the PNl?j attd(d) the order the responfies are delivered to the master by 
tile ANIP. Note (hat not all of (b), (o), and (d) are alwys present More- 
over, there are no ajssiunptions aboitt the order in which the slaves ejceoute 
tran^ctions; 'Sff^ can only observe the order of the responses. ^ consider 
&e ozder of die transaction executton Co be a ^ystra deeialon, and not a 
part of the incereoimect protocol 

At bodi AHIP end fnips, ouiisoing messages belonging to dififeient 
transactions on the same connection are allowed to be ioteileaved. For 
escan^lei two write commands can be issued^ and onty afterci^aids dieir 
data follow?. If the order of outdata messages differs 6om the order 
of CMD messages, transaction identifieTS mast be totcodnced to associate 
OUniATAB widi their cone^Kuidzng CMD. 

Oul^ing messages can be delivered isy dxe fnifs to die slaves 
Flguze 8-b) as fbUowa: 

• I^/idsinRi^ whldiin^osesnooiderondiedelxveEyoftheoutg^ 
ing messages of dilTeient transactions at the fnips. 

• Otdered localfy^ where transactions mnst be deliveted to each 
PNIP in the order di^ were seat^ but no order is nz^osed across 
PNiPSi Locally-ordered delivery of die omgoipg messages can be 
provided eidier by an ordered data tran$partatioz^ or by zeorderizig 
of outgonig messages at the pnip. 

• ^MSsTBergfoAoS^, where transactionBmiiscb^ 

dar were seo^ acmss all pnjps of die oonneofion. Olobatty^ 
ordered delxvexy of the omgotag part of transactions zeqnire a 

lYansaction response messages can be delivered by the Slaves to the 
FNtP9 (see flgve 8-e) as follows: 

• Ordered^ when retdaxa and rbtstat messages are retemed in 
the same order as the CMX>sweie delivered to Hut slave. 

« Unorder^df otherwise. 
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V/hem responfies ara unozdered, thdre has to be a meohaDism to identify 
die tnussetiQii to ivhlch a mptm^ helong^, TbiB is usually done using 
tags attafihftd to messages fortransaedon identificatioiifi (stfuilar to tags in 
VCS). 

RespoQse mds$ages can b& delivered by Iba AMI? to tfas mastar (fice 
Figure as follows: 

» Unordered^ which imposes m order on &e delivaxy of reqtODses, 
Hbi^ also, tags must ba used to associate responses vfi&L tiieir 
eoKxeapondiiig CMOS. 

actions for a singia slave are delivezed in the order the original 
CMOS were presented by Ihe master to the ANIF . Note that there is 
no Ord^ing in^osed fbr traosactiains to difiGsireikt 
same coDnecnoxu 
• Giobalfy ordered^ whcio all rehouses in a connection are deliv- 
eied to the master in the same order as the original CMDs. When 
fmosactions axe pipelined on a counectfoiii tiien il^bally-ordered 
daliveiy of responses leqtnres reozdeiing at ^ ANXP. 

All 3 X 2 X 3 s 18 ooinbhiatiinis hetwem ifae above ordez^^ 
sible. Out of tbese^ wa define and ofibr die ibllowii^ two. An laiordered 
connection is a connection in which np ordering is assumed in any part 
of Ae transactions. As a result, the responses must ba tagged to be able - ~. _ . 

identic to which transaction they belong, hnplementing unordered con- 
noctions has low cost» however, ^ley may be harder to us^ and introduce 
tte ovediead of tagging. 

An ordered conneotion is defined as a cooiiecfion with local ordering 
fbr the outgoing messagea &om PNiPs to slaves (Figure 8-b), ordered re- 
sponses at the PNiiPs (Figure S-oL and global ordering ^r responses at the 
ANIP (Figure 8-d). We choose \ogb1 ordering ibr the outgoing part because 
the ^obal ordering has a too high cost, and has few uses. The ordering of 
responses is selected to aSow a simple progiamniing model with no tag- 
ging. Global ordering at tiie ANiP is possible ata moderate cost; because 
all the ordering is done locally in the A^. 

A user can emulate connections with global ordering at die PNiPs using 
non-pipeliiied acknowledged transactions* 
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Connection latency, tinnughpu^ and Jillei: our nmc^ 
tfatougJv tit can reserved £»: connecticms in a tnne-dlv^Qn nmltiple ao- 
cess CpMA) fedufln, vhm bandwidfli is ?ptit m fixed-sisie slots on a 
fixed time fiame. Bandwid^ as wi^ as bonnds on latency and jitter caxi 
bo guaranteed 'Btto Blots are nsserved Tliey ate an dafin 
fh& slots. 

Guaranteed^througbpttt connections can overbook iesomces in some 
cases. For catai^e, wli^ an ami? opens a goaiajiteed^tbzoiigt^nt lead 
connection, it mnstiesetva stots for the tesd ^tmams^^ T nesgag^a^ and Ibr 
llie xead data ttiesss^ges. The lado between the two can Ira vei^ 

^ch leads eltbar to a Idxgo number of slots* or b^^^dtvidth being 
wasted for tiie read command messages. 

To solve tiiis problem, we aJlow the request and response parts of a 
connection be confgi^ted independently for all of tbzoughpnt, latcocy and 
jittci; Consequently, the request part of a copnMliQn can bo. best effort; 
^fl^iile the roGponse can have gusranieed teoasf^at (or viee versa). For 
tiie exan^ mentiotted abovfi^ we can use best efiSiat read messagest and 
gmnanteedrtfazDOgitput read-data messages. No ^obal connection gaa^lI^- 
tees can be o&ied in tins case, but t2ie overall tisiougbput can be bibber 
and more st^le tkan in the case of using only bes&»Bfrort traffic. 



Couneetion flow control* As mmtioned eariifit, our netwoik guaran^ 
tees (bat messages am delivered to the ni. Mtm&^ sent from one of the 
NIPS are not immediately visible at the other KTP> because of the multi-bop 
nature of netwoiks. Consequeody, handsbakes over a netwoilc would allow 
only a sii^le message be transmitted at a time. Ibis limits the Qisou^ut 
on a connectioin and adds latency to transactions. To solve this problem^ 
and achieve a better networic utilizBtion^ tiia messages most be pipdxned. 
In ^ case, if the data is not consumed at die fNiP at die same rate it 
aniveg^ eitfaer flow control must be introduced to slow down I2ie producer^ 
or data may be lost beoanse of limited bufibr space at tbe consumer ni. 

We introduce end^to^d i!ow control at ibelevel of connection^i which 
requires buffer space to ho associated with connectiens. End^to-end flow 
control ensures tiiat messages are sent over die network only when these is 
enough space in the nip's destination buffer to accommodate them. 

End-'to-end flow is optional (i.e., to be xequested yfhsn the connections 
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opeood) and cm be canfigured indcp^dently for the outgoing and 
turn paths. When no ^aw control is provided, messages aio dropped when 
buffers oveiBow. Multiple polides cs£ dt^ms messages are possible, as 
in QfT-fihip networks. Possible scemtm Include: (a) the oldest message is 
dropped (xnxlk poUc/X or (b) the newest message is dropped (wine pol- 
icy) [24]. 

We opt for a czedi^bQsed How consnol. Credits are assodsted with the 
emp^ buffer space at the receiver Wl Ihe sender's credit is lowered as data 
is seitf. When data is delivered at the receiver nip, credits are granted to 
the sender; If the sender's credit is sot sulScient to send some data, the NI 
at die sender stalls the sendintg. 



lb iUustrata the need for difiBsrentiated services m connections, to 
show in Ms section some fficauiples of traffic* We describe the properties 
they would vtsQ over an vCthereal connection to meet their traffic require- 
ments. 

Video processing scraaros t;ypicaUy leqmre a lossless^ in-order video 
stream with guaranteed tbroughpui, but possibly allow com^ted sampler, 
An ethereal connection fox such a stream would tequiie the necessaiy 
throughput, ordered fransacdon^, and ikiw oonfrol. If the video str^ 
produced by the mnAT^^ only write transactions are necessary. In such a 
case, ivith a flow-controlled connection &eie is no need to also require 
transaction completion, because messages are never dropped, and the write 
command and its data are always delivered at the destlnadon. Data in*- 
tegrity is always provided by oar Bemork, even thou^ it meiy be not nec- 
essary hi this case.' 

Another eicampile is that of cache updates which requite uncomQ>ted, 
los^les^ low-lacenq^ data tronsferj but oidermg and guaranteed through^ 
put ace less ijqikportant In sudi a casa^ aoonnecdon would not reqnlze any 
HmA related guarantees, because a low latency^ even if preferable^ is not 
critical Low latency can be obtained even with a best eSfart counectiao. 
The connection would also require fkrw control and guaianteed transac- 
ttoA completidn to ensure lossless transactions. However, no ordering is 
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necessary, because Una is not important for cache updates, and allowing 
Otot of onlsrtraisactioa can rodttce ^ iisspaase time. 
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1C Ginifilnsidns 

In tliis p^ier, we compare netwoxkB on chip (No^ 
(feg., compwtonetworfcs) andexMftg cn-(Ai^ mterconnecte (e.g., bufises). 
We fils>w that NoCs liatve many s2n»larities with ofi'-chip networks. Hocv^- 
evei; fhey also differ, espeolal^m tiieir rescuxce confiicraints. For exaznple 
On a chip, memoxy and computation lesonrcess are moze ei^ensive^ wMe 
tiieie BTB nioKd wires. Thia makesNoG ai^tBctmes d 
netmdfis^ imd tequires lediinl^ 

Wis also campaiB NoCs to exiating oipdixp intercoimec^ 
and switches. By directly ccnnecting IP blcdb, axi^ 
nects can offi» tight coiqiling bfit^reen masters and slaves, 
Iritiation. In NoCs, masrera and slaves are completely decoupled, and the 
arbitmj^on zs distributed over die naM^k nodess. This make it harder to 
prtnnule gnarantees, such as bandwidth lower bo^ 

derings. > 

We define a set of NoC services th^ abstract jGcom Ac netw^ 
Using&ese services in the IP design decouples computation and conmi^ 
cation. We use a request-xesponse sansactian model to be dose to ess^^ 
on-chip inzeroonneot pro&ocols. This »ses the milgration of cuzxent IPs to 
NoCs* Tb tuny utilize the NoC capabilities, such as high bandwidth and 
transaction coneuzren^, our services provide conneetion-^inented com- 
munication. Connections can be configured independently wld^ dtff^nt 
ptfOrparties. These piopertiesindudetiansaction conflation, van , . 

action ordermg, bandwiddi lower bounds, latency and Jitter upper bounds, 
and£owcontioL : 

Our services a«a a pxerequlalte fi»r seivic&'based sysi;6m design which 
makes sppHcations independant of NoC inqdementations, makes de- 
signs more robns^ and enabled aiddteea2m4nd^endBntqua]^y-of^ervice 
strat^es. 
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Networks on SiKcon: Blessing or Nfghtmate? 
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Abstract 

Continuing VLSI t^chnoiogy fcaUng raises seymU deep 
submSaran problems like nlatiyefy shw 
McA po}ver dissipation and distrilmtion, and signal ith 
testify. Those problems Ctre encemtered pa/^hrfy on 
tonSMfirea for global itttenormect As ahck fietpiencies b^ 
flrem ssttM -wires become reloftyefy slcwen widan^fy 
cotmnuideatkM yniibeihe Hmtfi/ig peifomance factor ^ 
fitttmch^ Jfba^hdnyi^i^ffieliufyfhar&tsofiha wires 



lem. ffif (Mnid^ ttat^uorke an stltoon flih9, ^ 
pao^ over Stared (semO-giobal^vlrea. IToSp&i^maaee 
U expected io be hisK hut comeg 0 a cost MdUtisdnsUie 
perfi^rmtfoe and cost ofaNoSisa mttfOrehaH^ige, and 
ws beHeve busses stSi havoja role pb^ 



1 TbthntAog^ trend 

VLSI MuBDlogy scaliofi bs& losg fbllowed lAom's law. 
^0 fimdameotalbamers bave been identified diatJnvalidose 
Hiis 1^ for at tern anb&er ddcadd [12]. Moote^a Iaw pn» 
diets that chips in 2010 wiU count ovar 4 UUioa (muds- 
cm ppmtmsintheittttltipGHs raqge. This alnistf aace of 
cron&istois will BuKe veiy comp]ex;^j«ei7u on ^i/mmir (SqS) 
possible. 

Howeiv^ cbfilltmgas all abstractjon levels of design 
wiU bava to be addcossed hetott such $o$& ^vill bccatod a 
jre&fiiy. The Aree most impanont deep fiiibxaimn (DSM) 
chaHengQSy lelated to aU obstractioA lovelsf axe: suibsiaiitial 
wife delay, coniroUiof powar deliveiy and dissipation, ond 

UqiII iccootlyt oQ-cbip witing was cheap. Coosequeiicty 
aithStectiuB] modoLs have toeea employed thatieUfid low- 
Vitency eommooloazion to g^baHy $hdre expensivB compa- 
tatioqal tesomtos. Global wite delay smys at best constqai 
under lecbflology scaling and hteoo tbese vfkes htcamaef' 
fectlvely slower conqiaxed ID a fiatd delay, Forexampldi 
£te 130 am zeehiudojy the xeach^lo distance of a repe^ ' 
ftUibal signal In a clodc oycJfiis no fflfiie than the tei^ of a 













B 




IB 





FlQUre 1, The number of 50k bloclis for future 
process leohnorofilBS. 



chip i4l* For 50 nm technology, Ofossing a ch^ vvith highly 
opdmized interconnect talces between six and ten cbclc- 
csyclea, deazly invalidating ^ lovMatew^ assutnption of 
today.- lieacs we mnsi fiiove to systeadi-levBl arpUtecittns 
that scale with bschsof ogy. 

A fea^'bla templatefor a finure-pioof aichiceotuteis oon- 
structed fnm procedshig nodea dui do not grow in com- 
plexity ^fli tocbnology. Instead* as technology scales, the 
number of chose processing nodes on the chigp gfow& An 
On-dup communicalzan necwoiic Oien cotnbhies these nodes 
mtoaSo5C4]. 

Vaiious pubUcattons show that the spanning wires hi 
Woclcs of 501c gsjtw scale with te<*ttology f4, J31. This 
means that the aforementioocdDSM fesues can be handled 
fay CAD tools, assnming chair cvoljitionaiy improvement. 
Fignre I shows ihe exponential^ increasing amonnt of such 
SOk blocks fior a large die in snbscguent technologies; in 
35 am this wunber is approximately ten thousand (adapted 
fi(om[13]and[4]). Icitmainstoilndaccnmmnicatfonaxu 
cfaztectsre that allows a 5oS composed of diesa blodes co» 
opoaceotneiontly. 

2 Netwtelcs on fiiUcim axe inevitable. 

Given the growhi^ demand for and impact of intercon- 
nect on system cost and peifonnaace» It is woxthwhzIetQ op^ 
dntlzetbeuliiizaiionofwiies. AtUocglobsil wiling stmo- 
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tiire$ often lead to a luige number of wiiCs with an avar- 
agi? ns^ as tow as 10% in lime [2tJ, To control cost in 
(his scemriOj the vyiie paddng density masibe very 
which is not bcnapcu)! for fte power and delay ehozBcteiis^ 
lies, Bfladeot mechanisms for sharing (semQ-global wiiea 
imist solve this cofii-peEfonnattee dOemzmL 

In de^ subzQJoroa tecbnologzos, ($emO-^bal wtea 
paed spec^ attention power, sxgDal-in^grity, and pe^-< 
fonnance tcascsu$. In the discussion below we nhaw how 
jfpeclal circuit techniquas con ha^idb these i^ues. Such 
techniquea only wot!:, howevei; whon embedded in ded- 
icaicd crnnTnotticarffm IP. v^^eh proyidea a znoie abstraet 

Power is ad issuo far global interconnect bocaose it ooata 
xnoiQ energy lo send a bit of infannatjon over longer die 
wites, TbziBduce the conununieadon delay, ihoonoz^ con- 
sumption inczea^e^ dae CO bigger drivois. fimploying iow- 
swing signalingfor the g^bal wires 6ave$ up to aiactor flnzr 
in power lor thosa wires [IS]. Zmpkmeatiaglow^wing (A^ 
naHngxieQttires special cfrcuit <i^Jini<p;^ 

Signal integrity is hanipaed increasingly by growing car 
paddve and indpcdve conpling between wires. Capacicive 
ndae eou|)lhig.i8 d)e lesult of tha large aspect ratio of wires 
in DjSM tccbnologies, ^idnctive noise coupling becomes 
more of a problem due to iha decreasing transition times. IR 
drop^ in the supply distnbotioa increasingly coniribates to 
the noiso. Tfao most elflsotivo way to xnake a connacdon ro- 
bnat against noise is ^limcion of di^erendal ^gnaling f7]. 
DifTetentral signaling improves bodi the generation of and 
sensxdviiyn noise. 

Hid signal piopasatlon delay of an uninieirupted wire 
grows quadradea^^ with its lenfiO; heni»£nini aoercain 
len^ onward; it is advantageous io paxtilioo die wire in 
segments with repeaters in between. The r^eatsr inaenloo 
techniisttoimptovas bandwidth and lateofiybwi at dio coat of 
higher powor consumpdon. Vfbo delay can bo reduced^ 
fat wires widi a lower resistance per unSl kqgth at the cost 
of lower wire density. Such Wires behave lite lossy trans- 
mis^on lines and require drivers with a resistanoo matched 
to the nansmisslQa linei. 

As a reautti we believe that all iotei^odEtcommimica- 
tion will be implemented by hard-macro transmitters and 
reoeivers^ employing low^^wiAg fiSomntial fignflfing, widi 
wen-controlled inieiconnect instead of ad-hoc dziverB han- 
dled by standard place^-and-rautB tools. In tins way^ oommu- 
nicadon linjcs can rcAMTGA yrif'^ pK^ictflhlO p iiTfarma7) «» f> 
and BSM rabnstness. 

Currently, the prevalent on-chip inteafconnccts are 
busses [1]. In a bte architectnre, devices share a single 
irsnsmission medium to communicaiB. At a givte tims) 

^Sopply vaha£e dtopa ore caused by id&iaaaniia Q^fUft/k^ thxdugh 
As Rsimce 00 art&e sqqi^ aeiwodb Sim the supp^ vph»(» xeddces 
under ^Uhg IRdfppwfixsEiu. 
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RgiTO 2p Structural m'i&m of a networfc on sn- 
Icxin consisting of pracessing nodes (P) and 
hedea aupporUng communication (R, B). 



only one device has access to the shared medJuni, An aiy 
bilradon mechanism is required to order simultaneous ac- 
cesses, Sucl^ functionali^.is ty^di^Uy perfonned by n cen- 
tralized bas arbitefc The parfbmaance of a sharpd-mcdinni 
bos scales badly. For an increasing nnmlier of bus clients 
Ci)individn4 cliesits got Jess bandwidth <m average, and 00 
increased capacilivo load$ and wire lengd^ decrease d^ total 
bandwidth. 

A solution that pairs ficalahle eommimioation peEfor- 
mance and minimal interc^cctcQst is expected fomiA^iu 
yvoF^ on silicon (NbS) wheie die SoS is considered 93 a 
nstmukof oon^Kments a. 1]. Figme 2 ilbsciatcs the 
hardwans arddtectore of this concept Tb& Outer compo- 
Beats Cmacfced ^ egccluavely perform processing and stor- 
agefanctiftafc whereas flieiiancrconiponeptfi(mflria^ 

fbnn the I{o5 and cater to communication needs of the 
ontocomponems. Iba basic building blocks ofaNoS are 
rOtttoisCR). 

A router fcvwaxds data fym its ii^t ports to its oui- 
pnt ports in a concunont f^shioa. Tb diat ettd,'a router of 
adty JV contains If x N switch matrix. Pm packets 
make fiieir way dirough the network based on die routh^ . 
information in dieir headers. A link betwcm two routos is- \ 
implamentedbyapolntFtD-pointconzkecdozLllieUriksQriH 
ic^y span medium to long distances tanging fiom seyeral 
toovermmdiantwentyinillimeters. Tbe actual lez^ de- 
pends on the chosen topology of die network. For a mesh 
topology tise links are xeladvely short, for a torus which ia 
a mesh With wtap-sioand connections, some link^ have a 
length of half the edge of the chip. liaks can be opdmized 
for bandwidth, latency, powei^ or a combination of tliese^ - 
depending on performance reqnirerqents. 

3 NoSteqnlreineiits 

An impoitafii dmcieristic of a Amuo j^ystomii-level ar- 
cfailecoire Is the separation between oomptttacton and com- 
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xkmnlc^tion. A NoS allows the computaiioaal blocKs to 
COimnuiueatB w^th ono ottier via a Utdfonzi \i\wfnrft. A 
nnifozm inter&co is advantageous because (i) H iVeos &e 
eotft dovelopfir fiom having to make asstuupdons about the 
system in vi'hifili the cons will ho ifsed. find (ii) docs not 
constrain die developxndnt of neivtrer commnnfcaiioa dcchi- 
CectinBi by detailed imerfadng requicenoonte qf paiticular 
legacy SoC am^onents [iQ. Seyoral on-cbip bns stiindaxds 
aiB e¥Dlvisg to zealize chxs goal, most nocob^ VCl put fti> 
ward^ VSIA [14]. dndmoroitcoat^ty, dio Open CoioPto^ 
TOMdClOI. 

1^ fandameoisl aim of a NoS Is to provide fladbla and 
efiBoienc commumcationbetwttte tlio diouadnds of IPblodcs 
in a 9ysiefflf wifli ^QnnaBoo&XdsmtBes, la a tyjAcslSoS^ 
fho coQiBUUiicatiozi {fmg^irfg of diflisrent IP blocks sboiv 
Tt yige variadons. fix esiaaph, data tates mi^ bo oonsnnt 
(6^. digital video) or variable (e^. oonqpressed video). Hie 
impoitance of latmy find|UiBr adso varies fiiefl]^. FSnaQyr 
rtta data eesnnlailty may range ftDm singlle wools 10 latge 
blocis, A NoS pboifld bo able to o£bc iQlteeat SQE^^ 
difiereatcliaiix. Bach scndoeclass must be laspbsqemed 
olBcdenity, nsfng a sliared nnifonn infkastRsctwe* 

A high ndlizadon of the neiwodc conies at a price. When 
die nciwprfc stms 10 satnrate. dtrou^ut axKl laiency wOl 
show hoge vodadon^ wblQh is not acceptable in ical-liniB 
appHcatio d& H^ice, Che neiwozk sbonld also provide gnat- 
antees^ Iil;e loss^loss dam uanspoct, minrmBl haiulwidd^* 
and bounded lateot^i The way padccts aia bnffered and 
scbednled in iouffii^ and fhe effects on peifittmance fiuai^' 
fmtom has been the sutgect of intent research. BUnda- 
nensaiiy; sSsams ^ &atmm aze cuuSiaSn& and 
dentiy eombbdng gmranteed traffic with best-effoA traffic 
is haid [1 1]. AUsongh bcst-efitoit seivices are cheaper than 
goaxanieed services we believe that the imtor asQ esseailial 
because they enablo compositional and scalable integration 
QfdieJPbloeksCS]. It is up to the IP imegraror at design 
rtwiH^ and npto the application at run time, to make a trade 



4 Perftormanca and cost analysis of NoSs 



The vision of picvious sections is diat die de^of fii- 
a»e SoSs will allow IP blocjes to bo plugged in ai win to 
BMaa2& conommication costs, b^t widiout today's pxob- 
terns like dndngdosuf^ Iadussecdon)Veinve6d8aiedic 
cost implications of system designbased on a NoS. hope 
die vision comes at acc^iablocoK?. VVe hope fliat die owef- 
aU cost of a l^oS, includii^ the fim prpcoeol s^ 
tnm out to be acceptable duch tbac the inf^ntronblesaiags 
of NoSs do not change into a cost nightmam. 
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4.1 P^ormancfi 

The aggregate bandwiddi of a router is the prodnci oftho 
bandwidthporporti BTTpoit* t}iearxty of the roatoir (number 
of ports). iV, and a ndli^tion ^tor, a ^ I cozxBsponding 
to die muter arbiitadon schome. 

Wo discoss each in pun. The bandwiddi per port is deter- 
mined by the bandwiddiof the link anddie router datapadi. 
Xnshoit: 

BWpfift ^ S mitt(JW«iiv» BW^^ta^b) (2) 
whem J9 18 fho width of the dam path. The combined band- 
widdi of die jI? wires of a .Iink is a funcdon of thp layout 
charactcrisdcs (e.g. total length)^ chosen signaling tech- 
nique, and die budgets fOt poweiv delay, and area^ A &st- 
order mqwession for die bandwiddi of ampoaled global wtze 
Qpiinxiaad fbr power-delay is 

where lOi is die delay <tf 9n inveder diiving fonf eqnal^ 
siziBd invertKS [4]. to a 100 am teohnobgy» dds yields 5 
GMs per wire onder wemi'-ease environmental coodicions. 
NcdcB diat die bandws4dk of replied 0obal wires scales 
wilhtBcEmdogybecBUse5UClivtdresaUow(mve)pipelim^ 
Qt tha segments. 

Running theronter dam padi at 5 GHz is not feasible. An 
aggmssive but realistic fisegucncy is 1.25 GHz conespond^ 
ing die clock fcBquency of 5Qk gates block? |;4]. The c ri t ic a l 
i^jserSos sa dse data path is a» JV x JV^ switch^ For up 
to 20 it meets die OH2 data rate, using N l-out-of-J\r 
multiplexors. Hie nslaxed demand on the wires of die lizdc 
can be n^ed to reduce power dissipadon and uma. 

Ttc tttni^adon fhctoi^ Ot redeccs the e£Bectiveness of 
die router to resolve contendon on the links. Ihc qncu* 
Ing etraifigyi die qpeuc ^2os« and die schedolo algortihm all 
streaky influcnoe a. Accordingly, many queuing policies 
andscbedulmg algoxidims have been pfasentcd indie Hier'- 
atme. example^ a = 0.59 ibr infinite fifo input queues 
widinniformandiiid^endenittafSc. (Vlrmal>ontputqtiCtt- 
zng gtvea ft s 1 under the same conditions, but at the 
cost of larger queues and a more oompleK acheduHBg algo- 
n'dun [g]. Stadc aohednling tedimquesliks (dme-divisjan* 
mulnplexBiO ciicuit Bwitdimg can also hopiovB the udU^ 
tion fii c t o fe 

Hence, in 100 nm technology, die bandwiddi of a 32 bit 
zouter port is qnnoximatety 5 GByto/sea 

4^ Cost 

Three main oompocents conndbute to the area cost of a 
rower: the switcli|ifae coniiol logic, and thepaCkot.qneues. 
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TZio switch anow$ N ^iinuIUneous conneciiQiis horn th6 
/V inputs totboN ouipots which xe$ults in B taays of JVx 
N wizds, givifig jIsb to an 0(jtP) area cost 

The contnd logic of d nmter is mode up of thd swikb- 
imunx schedulo imit and other conigoration logic. The 
delay of a schedule cyeh varies greatly per algozitbm 
(fbr example, for ylrtaal oultmt quening foun 0(1) to 
0(N^/^) 13J); it 3s ijoportantibf wo ffiasons. Fiist, it de- 
tennnies the lower bound for Jateflcy that a fllt^ imici to 
tcaveids the rofites Seeond^ it sheets (he size of the queties. 
The longer a ^ted&le cyeio, the nmzB data aixivB^ gn^ a 
&ced bandindth of 4 pott ^p7paH- TUa tead^ to deepef 
queues, and higher aiea cost 

The three afwcnrfuriongd goeiUng stratep'es require 
queues of nze 0{N) to Ol^fi) flius. Scheduling algorithms 
peffasmbenerwithdeepor queues, withadeaeasingrctnm. 

Besides routers, a sfgnificauc amiazint of an^ coosumcd 
liysp-«aIlediie<iivhfcMc9&cfirCNI}iB0dtt]es. These mod- 
ules tEanshtie iha IP tiaiisadioiis for a givea coonectioo to 
. padcets tfaatarescntoverthenetwode, and vice vm&. Pack" 
eis ean'be seat once the payload has been caQmplstB^ ac- 
eqrted. tyr the NI Hence, the buffers must be dimepsloiied 
such iha^ at lea^t a complete pactet fbr evczy simobac^ 
onsly ac^ve eooiiectkku Can be stoied^ 

Obe tiBde off between tatl&^oa a and the coat ia a eom- 
pIeK one. but of imponanoe to iHe viabiU^ of NoSa, 

5 Tbefimire rote of bosses 

Xosocdoes 1 aml2wehaveaxBaedtfaatKo$8 8re6Ssen- 
Ula to solve So$ intsgratlQa in a salable fashion. While 
Section 4w2 raised some geaerai cost issuos, we will now 
mo» coAcretdy consider the trade tiS between btuses 
andNbSs. Wllfjn^e^^iyiiahedNcSfeomplesely replace 
eurrent basses in Jistum SoSs. ^ will a kfbHd t^proach 
met&i? We believe that shared baases may have a role 
to phiy in fizat-levBl cemnronleaiioa (B ittFi|we2) for the 
fbUowhif reasons. 

■ First, typical IP hhscks undcrutiSza the bandwidth Cfr> 
paciiy of an individual xouler pott AH router ports ofiEer iho 
same bandwidth that is inhetiait to the aidixfeotuxe, wfaoteaa 
Che bandwidth icquirements of IP blccKs vaxiBs gieafly. A 
shared mmoty module i^s typically much higher (pe^^ 
bandwidth than a stieamir^pedphsral device* Siu^ word 
transfers, vaziablebitiates, bursty 10, and nuichlower clock 
rates for IP blocks dian for the NoS further waste baad- 
width* TbU means that the commnoicatiQn needs of a mim- 
ber of IP blocks can be ossngated u^g a bos be&re (he 
capaidQr of a notwoik llnK is reached 

Second, networkinterfooes are mace expensive Cm tenns 
of area) thane bus ad^t oR UsingB bos as a&st-level traf- 

^nh snodli Ar flov mai jdSfi^ As innrio ponlba cf dMa 
pffsdicAito eyde^ Apste is defioovoMdlii flits. 




Figure 3. Aaharstf-nieclfuni Kiuaseanisa eos^ 
efteetlvo to connect the IP to the packa^ 
swltehed networlr. 



fie coaccnttator, trading bus ad^tors for networlc interfacea 
thus reduces the overall cost of IP-XoS intoi^iig; IfVb ex^ 
pect that fliflPifBrheadofalmft and ftnnim ftA 
Outweighed. 

Ffaially. the nmnber of lonterB is ledoced significantly 
when bu^es aransed as the fitst-level inteicounccL Rouim 
are larger than bosses doe to their packer queuea and more 
CQoptex scheduling. We give an exas^Ie below- 

An example of the hoteiogeneous communicatian archi- 
tecture is defected hi Hgure 3; A muter of ad^ ifane sur- 
rousxiedby twelve IP bocks is shown. Two shared-modmm 
busses, each ooonectsd to six 50k gates IP blocks, comxnu- 
aicate widi the lootor via two:notwoii; nueiiacesi lliese ' 
liave two fnncdons: first they schedole the cransactians on 
dio bus, and second they giveii (he bus oUeais access lo the 
paeto-^witched netwoilc. Hie third pozt of the router pro- 
vftecommunicalio^lodterBI^afttderQflhe^etwatkHig- * 
ttra4sbowsanari4iitBCDtriensbigQnlyzouiers. Now three ' 
routeis of azity five and one of arlQr fiaursitB seeded. 

llie suggested diaiedrmedittm bus has alei^ of 3Sk\, 
where A is half of the length of a minimal 1i8fl8ialo& Global ' 

win^SOflhishagthwiDnotbelhebottlb-iiecitofbiuipeiw ' 
*a 



Itxofeasibilfty of byhddNeSs hinges on ihejrfght implC'^ . 
menialioa of the busses. First, ihey must be shared w&s, 
as opposed to switches. Second. lixezr arbhration must be 
eombmed, or atloast compatible wiOt, ihoschedixtingcdkiE^ 
place in the network inieilaoes^ to o£br naif onn end-io^ 
network services, 

Wa see a fiimre for hybrid NoSs, with first-level commu* 
nication over a Aaied-mediumbos, and the higiher levels ua- 
iog a pacm-switched network. Perhaps a packet^switBhed 
network can be seen as a distdbuted and so^ableimp1emen« 
btion of a logical bridge that c onnects all die local busses of 
the SoS. Deciding how maxr/ IP blocks can ass a local bus 

^Mjnimum-df^y win: scgmesa have s of 2&kXy wbo sagmeots 
aptiiniz&da9p9MPtfBfivpiDdpcth»i«alo«Ehar48k^ llieaekQ0h> 
spsb vnth tBcbiolfliBy flto ibD e4BB of SOkbtofiks t49. 
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Figure 4. IP to IP eommunii»tlon based on a 
homogeneous router tiotwork. 



snistbe onswcied &i{enu^ 

£ Conclusioxi 

We MvG Qigued in Seetlfm 1 fbai fdtttzB systems m s!^ 
oon CSoS) win be composed of lazge numbexs of process- 
iQ£ nodes (or IP blodca). Each processlnff ttoda Js'tela^ 
lively small (50k gatea) to sc^Ie witb technology, dnd can 
be ha&dted by CAD toola, assnnoing their evolndonaiy im- 
. . - provemfint The intercojinea and commiuucatKm between 
Aese blocks thea becomes u esscoti!!! funcdon in itself 
(Section 2)» leading to ncfwozlcs on silioon (KoS). A NoS 
is based on pacIOKt switd^ 10 flexibly shaie liok capacl^ 
between dsB network clients, and to |sovi4e^liirifonn cam- 
nmnicatian services over a nnifotm in&asmictiKre. Boib of* 
Dden^ provided by best-effox^inifSo, and predictable per< 
fomiaQce, such as guaranteed tbioufibpttt and latency, aits 
iRg)orta]it(Socdoa3). BfUciently eombinidjg dieznis a dia^ 
lenge. SeetiOA 4 sluiced diat die poEibnoanee of aNoSd&. 
paads aaminy ftet^ bat is e3Cpcete4 to be lugh, Tho cost 
of a NoS caa be stalB4hi tenns of aioa CroutoB^ network W 
lotfiices)* utilisationof wirea, andspeed (latency). Thoy can 
be traded ofiTag^iksi one anoiibet; but aJsob perhapsmoio in- 
ECsccstin^,dgaiQSt tho cost of blisses* AlotoidNoSti^g 
shaittd'Wiie biisges to camnwpieaielocalla^ ami accamnTaU 
idg trafSc fbr a core joutoF oetwQtk is a pm 
nueUiat doservea to be invostigated. 
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ABSTRACT 

Maoagii^ Ihft cnnqjllexiiy of destgjdi^ 
ttmafctm xe^p3ies deooiipIiDg confutation ton communkatiaiu 
Bar £hd comzumicalioQ, scalable ond cooaposfiioii&I intxnconntcia 
(inidiuneCwaitooiiddp(M^)iBUstbfiis3ed* &i this paper wa 
flTMwy ffssoiiSded Bwicoa axe es$etttidl fai addci^i^ lids 
emg^ias. Guanntees tiyj^caHy comeatcheoo^of iDdffidei^ 
Booree otflisatlaa. lb ficfafeva ^cbncy^ they moAt be nsoiin oom- 
binfitiazi withbesMfifoit services. Wed&sctibe a NoC lucUtectoie 
tbat efiMeatly c<miHT*gy gqaraatnuil and beat«ffoit serviccA. Hie 
la^ Qb»im of cmrKoC is atouter ooDSiffting canc^t^^ 
paxts^dff^iC^ned goanmiccd thfODgl^<CiT) and best-efSnt (&e> 
nnito. Both a£Gardatniaii^iv*lo9SsIesa and iiMderd^ 
eiy, Addiiicxially, Itte zootar ofiTeis giftBaiafiedtilfCRQitpit and 

ficjenf^by dbaringcoiitierxcsataces, epftbKag high link mib'zfltfon. 
tbe ^oajQDiiees ard nenner affected thd BE (x^o* and links are 
offitientty utiOzed becaDqsa BE tta£Bo oses all bandwidth left over 
tasm or va£Sa Gomectiofia aia proBnttunod nstng BB pactets. 
Tlu^pmgraamingfQoddiisraibnsc. conc^ It 

«mj)K|p.a fwiwhrift Hwr! eom]^A.titfia, detiSHUlllifltto and odaOtlVft Cfta> 

Ti"^*i fg n fftmn ^f*"^^ . I^ri^ our w^UtMOifal chokes, we abowtbe 
tnade ofl^ between harJ wa ift complexity and efiQcksoy, and mod- 
VBto OBs duicea. 

1. JNTOODUCnON 

Heceot advances In techndogy xaiscito cbadknga of manasing 
tfa^ CMttpIddiy of de^gi^ng cb^ oontdning billions of txanalstois. 

ingzediem In tackling tbia obsltei^ is deckling Ac cony- 
p;ae^/fX>mcim7XU7^cmonf?yl5l. TOs dewH^Ung allows IPs 
compotatioii paxQ, and th» imocaqnect (tbo cnmnnmicaiSon 
p«Q CO be de4g^ indepeodenfly fiom eadh odub 

teaeeosnects (e,^ bosses) mi^ no SoBoerbo fbasiblo for ddps with 
jpMy ipSj bocauso of ibo diverse and dynasso eomffinoicBiian re- 
qoiiemems. /^«^rb on a ^ (NbQ aze emagin£ as an ate 
naiivB to exisiing on^bip intercomificts becanso ibey (a) ^injcnxro 
and inanags global wires in saw deep^bmicron technolO£iea 02. 
3i 4i ^1 (b) share wiiaSi loweying their immbor and tocrea^g dieSr 
ptfK^tfiffl C4. 6], (c) can be eniogy cfBcient and feHabb £2.], and 
(cD a(B ficaiable wban compatod to tnditlai^ 

Pamis^&on to fluke di^m crbaitl oofSea of allorpiiftof &is wai&fbr 
pevmifll or cteSSMOin use » ^amod cvjlfaottl feo pravidad that copic? ore 

noinadsor^Miibnied ftrpcdherainnneR^ advantagB and chat ca|iifi$ 
bear ilQattofl«8 and the fkin eSsntipD us the fin! page; To copy oihefwjte, to 
lepublisb. to post es setveis or to ledisidbme to icqmies pdtf fipecific 
pomfsslon asdAv a 

Sopyri^ 2001 Aa4X.3DOCXX-XX.X/XX^ -^5.00. 



Decoopiing tb& conpuialtoii fiom conummicarton teqttirss tbas 
services diat IPs nso to comnmnicafB (a) ara w^-defined, asd 
(b) hide the implemeomtiaa details of tbd inteiconncct [9] . seo 
Hgiire KoCfi asain halp, becauso ^ are iraditionaUy do- 
signed l^^eared pratocol stadcs [14}, whan each layer pn>> 
T^des a wdl-defiTTftd inteifeco widcb deconplas servioo naage from 
service tojpJemflnfutfon [15, 3]. 

In paniotov ^^uanmteed auvkes am essentia] becauso ihay 
^niife& ibo sB^pdipeflKSta ool ifao tfo^3 esi^Uciti fliny £ffi2tiQg ^dio posst' 
blaintaiactioQs (a atticter contnod oi1?» with das gmnimininatfnn 
enviioimienL A$ a icsolt, IP de^ ia sinipler. IPs can also b& 
designed indepeatdently, becansotbdr gaarasSBcdmrviceaaronot 
ofi^cted by die inteiconnea or by Olhar IPs. This is essentia] for a 
coayositionfll conssi action (deti^ apd progtansniE^ of fiyatcms 
on clop. Mozeovei; fbr gnftnmtfrri services, fMhtres are testrictad 
to the IP conflgomtlon phase (a service ncpiBst is either granted or 
deoled by the NoQ v^bicb sinspIifieA ibd IP progranuninsr node {Q. 
Wo view die goaxmiteed aerviocs to be offend by an interconnect 
as aveqoiremfipi fiomfho ^e^plicadons, fiM 

lliB dkanAn^ of nahig gnatantecd teivlGes is ihac ihey le^^ 
xBSomcB resetv&sioa Ibr wocsMse soeqaxios. As a cooseqacBCc, 
lesotuces may not be efficxemly ntSUzed, wbidi may not bo ai> 
ceptaUo in a system on adaip wbeis cose consTraints are typically ' 
very tighc, aea Figure 1(c). Tb averoome ibis problem, best-effort 
serviees emhei^ed for leaa cxitjcal cornmonicafion fequsremsQls 
to itiDy mUbsB the availalilB resources. Using bestrefCDn ssrvices. 
ho^i^aver, provide no goasBntees. 

A comprottdaobetnaeBDaing guarantees only and having 
fideDt innsconnoct is to condiino. guaranteed, and lisst'^^otf ficr* 
vioea. Qoa^a&taitatSoaiaMM 

fic, while best-effort oaCEifl may use ^ lbs xesoorcea not used by. 




Ftgnrelt NatwoickseTTicosMhidetiierntercottneetdetaikand 
allow ren^ble tomponenia to be bnlld on top aracniy (b) ate 
driven )jy tiia appUcatton rBqqiraBenl&^ (o) tbelr offidency » 
lies on technology and network otfianfegitlnp, and (d) ambnfld 
nstnsB beared ^cou^ 
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gumameed tiaffK Coaianteed service would t^iea be used for ihft 
odtical (raiHo reqaiimenis, and best^oit services fiDrnon-cdtical 

In ihis paper, we trst a fia netwcai>uidepe^ 
nicaiicm services tbat are efidenU^ in ddp desi^ 
Sections, wd sbow tbe tmde'4s£& becvi^een efficzency and cost that 
we xDilkB in onr NoC In Sectioa 3. wb present (fac tzade ofTs and 
take dedsiana on nctwodo-relatdd issnes. In SecUon 4i We zoom 
iato tltd xDieraals of the oompaQent of our NoC: a tooter wluch 
^den^jr conAdnes goaiBnteed and best-effi>it senSees. 

2. SERVICES 

The incie&siiig cmnplexii^ of inte^grotdd csnniits. and the scrcu^ 
tizne-to-madBE preEsme leqniretsodular desiSDS and IP reiue. De- 
coupling coniputaUon £rai& commimicatiQA in clisp dgs^ gn serves 
both fhe^ two reqdiemans [9]. This dsconpUng is zealized by 
defitting cammunieaziozi intolSacfid that piovidd weU-defined ser' 
vices and fndo the implfimeatatibn details of ifaa in&erconneet. 

We ^ow in Secdoq U that guaranteed services are essential to 
sizqdify IP design and intesradoiL Bxao^ples of sndi guazBntefid 
services ate deaa imeerityt \Mdti aassmes the d^ is ddiveted vr^ 
coaofitod, 2^uifa»dtedfflftwy,wliieihi^^ 
in Jba iptercoDOCCt, iin-pnisr data delivery, wUdi specifies that 
the Older in which data is de&vacBd is Oie same Older ill which it 
hashes sent. OtoggatanteeaoflfertinMalaliedbotttdSvanchas 
ihrough^ and haeney. 

Cnanntees require icsopjce reservation for worse-case scenar- 
ios. wM^ can be ejqrsDSive. For example, gnaranteeins duoog^- 
put for a stream of datafmpliw; reseAlng handwidtb. fca: its peak 
thEoaglQ}iity even whan, its average is nmch lowec As a conse- 
quenoe, when tising gnaroiitees, lesoin^ are often m^dmnilized. 

Kesonrces aie better utilized when best-effort traJffio is used. 
Jtot^giwf ygyvtodonotiBSBrwe any reso mi w ff , aad he ocn provide 
nogoasastiees, AaacoQseqiienpefe diofrpefliMinanceiafOctatedby 
bouuXaty eonditiain, sndi as intBtconaecc load. example^ a 

f j ftftn fi» ri fm may ^e^nmo tftmporaaMy Iflfigy m a cpngested aeftKHt^ 
if the oe^voifc MSflivea cnqge^don ^ dtopping dm 

Best-effort segrvkes me resources wdl because they are typicaHy 
deigned for avetage^ease scensdos as opposed to worst-case set^ 
narios. They are also easy and Ust to ase, as they require no ib- 
aourca reserrotion. Their main disadvantage is their impredfctahilp 
icy: Qziecannotie^onagiviaipeifonDBnce Ci.e;,chey donotofGsr 
gnaiaiitBes). In dis best case> if certaia boondlaty conditions aie 
assmned, a statistical perfotmsnee canbe dexived. 

Itlie re^UKtemenis for guaranteed services arid iho effi^^ 
fitBrfnr rMmiti^ tttfu^affwp) Am rtiwmgtinjy. Burafixatatep 
fD a pcedietalde and low-«o8t isterconnea la conMdng the guar- 
anteed and bestHsStet setvicea in die some interconnect Gu»Bn- 
teed servloea woidd be used for critical trafSc reguUementB^ and 
best^eSbrt services for nrarciidcal traffic leqairoments. For exam- 
ple a video processing IP wiUlypieally require a lossless, in-arder 
video stxeam with ^^DTf^ dnoi^fapat» but possibly allows com 
tiiptBd san^Ies. Another example is cache updates which require 
oncom^d, los^s, low*latcncy data transfer, but ozdeifng and 
gcuirDnieedthnmgl^ are less inTortaot In Secdon 43 we show 
bow combining guanmtecd and best^effbrt sciviCQs eiXIciently uses 
common resources. In thereoaainder of this secdon wo analyze the 
wrftrinftttip level of abstrsedon at which the commu nic aTt n n servioea 
must be ofiered to hide tfao network inietnab. 

lte£tionatl]y. setwod: services bawe been ht^emeoted and of- 
fiared dsu^ a laycted protocol slaolc, lypi»Uy ijigned to die ISO- 
OSIiefeieneemodd[14]. NoGs also talce this approach (2, 3, 6, 
1S]> becanse St smcuies and decomposes die service inipfemeot^ 



tion, and tiie protocol stack coocepts aid FOSi!igp^{)oggiggces, 

lb achieve d:e decoupling of computation from conmnxmcadon, 
Qie comnnmication services must be offered at least at the level 
of the banspoit layer in OSIteferenca model ft is die first layer 
that of ers en^tiKend services, hidn^ die network details; seo 
uie I(d> [3], 

The lowest three layers in the protocol stack, namely physic^, 
dsta-Iiok and ne(wodc layorsi ore netwodc specific. Therel'orB, these 
ser^ces should not bo visible to dn IPs if decoupling between com- 
potadon firom communication is desired. However, these layers an» 
essentia} in impltsmentiEfi die services, because constracting guar- 
antees widmut gqanmtees at the layer betow is dther vety e^qpen- 
sive, or even impossible; For exan^le; in^menting a lossless 
coznmnnicadon mi top of alossy service leqidies aticoowled^imcai; 
dam fclnnsndssion, and iiliBnDgdapUcaied data. TUaJeadstoa 
significant sdopmso in tnffio^ and also a trade off between htga 
buffer space roqiuirementB and lor^ d^UQ^ Even wotae, providiqg 
guarantees for dme^related services is inqnssiblo if loiwer layers 
do not oITer these guatanteBs. ForexanqilB, throughpttcannotbo 
guaranteed if conumuncatian atalowislayeris lo^. As aeon- 
sequence^ guaramees can only be bnlU on r^p of guarantees, sea 
Figeis 1(b). Similaily, a layer's dfiGieuc^ is based on effiment im- 
plcmsntadons of the layers bdow i^ see Figure 1(c). 

Tba NoC servicos that we consider essential for chip design are: 
data integrity, lossless data delivery, in-oxder ddivsry, throughput; 
and latency. Data integdty is always goacanleed. AH die other 
smices can be gqaranteed or Dot, rtqpfflrfing on request. lutbe 
flSXt aectioEn* wo describa biiefiy how these sepdcea are provided 

our NoC; and in Secdon 4 wo dafioibe in detaa how our router 
arofaitectDro cD^^les an efi^CiooK n^rij^ fl^niMMation of tbi ^yi^ aoodces. 

3. NETWORKS ON CmP 

CttnanQy, the prevalent on-efaip intetcmmects are busses and 
swUebes (10]. These ace smgle^fup interconneets, meaning Uiat 
Acre is no storage in the interconnect itsel£ Scalable interconnecta 
reqidro sndfiplB hops widk stoxflge in every hep (^on^^ lUsin- 
iroduces amunber of newissnes, wMeh wb discosa in tMs section. 

Ceneral computer network research is a mature researdi 
field [16] wMcb has mat^ issues hi common with NoCs. How- 
ever, two s^gmfieant 4tff(»ences between computer ustwotks and 
eo^^tap networks mikt the trade ofSQ in their design differ- 
ent First, routers of a NoC are xnorerosouroBCoziscrairied than 
diose in computer ncvwod^ in pardcatarin die Gomxtd eompleod^ 
^Hwt the BEBOsnt of mamos^/. Ssoont^ c^mnnuDlCBitinn hufcs of a 
NoC aserdadvaly shortertfaan diosein cmivuter netwodesb adow- 
fi^ tig^ ^yndiKopixatiOQ (nfitwodcflow m 

nmse two charactonsiics have a dlreci in^iai^ on Ibo NoC sa^ 
vice implemeutatioii. In a NoC it is pos^Ale to solve die dataiu- 
tegtity at (he data-link layer ac a low cost We» thereforb, assume it 
solved at the netwodc layer and higher. Lossless transport of data 
is guaranteed by our routem. However, to allow consumers slower 
than producers, the network may be allowed to dmp data at its edge. 
Consequendy, the designsr may choose either Ibr (a) a lossless con- 
net^on (Le., unplemeniing end-co-end flow contn^X or (b> a lossy 
connecdon (Le^ without flow control). &i^m3er delivery is again 
guaranteed by our router 6.e« routos do not leorder data between 
a given input pott and a gSVGD poitX Eadiito-flnd'ordeiiDg 
of data, however, has to be provided on cop of Qds at die network 
edge when data is transported on <9£te9fenc cantos wim diSI^^ 
lays. Offedng gBarftnlaed and besli^irert dnou^iput and Ifllfen^ 
services is also implemented by the lonters: These router servioes 
togi»iier with die ptogramming modtil explamed in Secdon 4S2 
o&r network throDgfaput and latency services. 
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Wo i dentiQf four impo^Utst issues in the Resign fif the rooter nst- 
woikwdiftoctniBL lliASBaiB: ttl^ iwUc>mtg made, rtntting, ecn^ 
tenfrcrt resoltai^ and nefyi^rkfo^ control Bqaa]]y importam; 
enii-t^hGtdfiaw control and congestion caaMl arzbsad^ in ca$r 
NoC at ihs necwoft odgo iQi^tea^ of flie loateo; ^ 
tbeir (Sacussion hfizGu 

3.1 SwitdiingMode 

Th s smtdMn$ mode of a ficttrork spedfies how data and oonltol 
8se related. We distiiisuish cfradt switdung mA packet sw&ddng^ 

In dicmt switching data ^itdcoatrol msepaz^te^ Firsiih^coiir 
tod ia provided co die oeiiwodc (connectUm set upX TtoA tosxAis in 
a.«]f^£r ovifir which all aabseqn^t data of d)e cotmeedon is trans- 
ponedi In i^nt&^iyision swifcMng IwsdwidOi ia abated by tfane* 
dlWsiaq muldplfiadng coDoecdons cyver cineuitB. Cttelli^^«dtch£d 
00twcaic9 iflhecently tima-zeJated goaiaiUeed ses^dces when 
imoxees aiD {Bseived duiiiQ odnnec^ 

bpae!^ iH^'teArof data ia divfc^ ioloiMi^^ 
IsoaiqmaedofaGiuitcolpartCdffiA^sf^^ 
ImkO. N0l»mdc loom ii]3pe<% and possibly modi^ 
of immbg paeiEDeia BO Bvvitcli dte paciat to tbe ap 
potpoic SiaceSapafiMfwiloiUiislfafipae^ 
thn ia iM> seed At a aeti9 ^ttfle to allaoate icflfl^^ 
aenim m dKiefamiiaiiiiiQ^ 

3«2 Routing 

Routing is the datenziiiialton of dio route (or patlO tbat ibe daCa 
foUow? firom fiOttrto co df^thtatf niN There aio two basic agspioaeboa: 
sotsrce routing and destinatian touting* fn saatcQ routixig^ cbo cet- 
wodt £n|Et£Ke at (ho aonrce eon^mtes dis con^Ieto route to Xbc 
d^atinaUoa. In deatjnatiofl roulzog, only tha netwodc addttss of 
tto deadnatian is spedfiedi and eveiy ronter selects tbe appmpi ate 
Ottipat baaed on die addiesa, W^cefiErto [17] for sevecal classes of 
zoudng ftnetjooai 

oicoidc swftcfaSiiSi loming nfas idaee at 60^ 
once aa ihia in diat cQODcodon. )hi pac^ swlt^ 
dooo lor evBty jndividnal parOoet sedit ovv fbfi necvQ^ Ihbodi 
ca^, Samoa and destination rou&ig are possible. Wo cunenUy 
lyisialder sooiea zouting becanao it is indepoideat of tbo nn^ 
wigde tx^cApgy* wJdcliis not yat deteRmoed. 

3J3 Gantentioii Resolation 

Whfirt ft rftUter nUfiff^ mp] tfplft <f flfa itlMB^ TTVPT thfi same 

lidfe at the samo time cloiifienitdn is said to oocim As o 
itm cmi oeei^ a lii^ at aiqr point in time a sdeoiian among 
contending data nmat bo made; tbia proceas ia called confention 
sesoiution. nizeB q)prQiiehea eodst: otvoldinsoonsBntion» dropptqs 
daia (one of dm cQntgadSng data itsm ia tranamted and tjie xmm' 
deraio deleted), and sditdBEling (or itoquentialiiia^ data (qU data 
items ai8 seot^ turn; aomo (kxa Items are tfaoefoxedeilayfi^)^ 

In circuit swicdiing eomention resdhition ifiJcea place at set up 
at the grasnlan^ of oonnfietzonsi so tbat data sent over different 
connBCdans do net conflict Thus, there ia so contention du^ 
datatranapoity and tine-Telated guarantees can be given. 

In paclcet swiccbrng contentiaii re^lution tokos place th& gnm« 
tilarlty of individual padoete. Dropping podcets Is possible, bncfiir 
aloaskas sarvica (a) it adds caa^Botiiy to die natwodc (odoiow]. 
edgmpitfg. retranamiaaion, etc), and (b> it ultimately inereoaea die 
tniffiebeeaiiso dropped p^otKetasBod tpbevesent. Uma. scheduling 
data la fba oidy ven^ining optiofi. 
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Netmrkfiofw cantroh also called nnabtg mode dsala with flze 
United amonnt of bnfiEerio£ in nmteis and data aoe^tanco between 
toutexs. In circuit switching canaectiom are set xt^ The rf<\ ta aqnd 
over diesa connectiona Is always accepted by the routors and itenca 
no network flow control is needed. Inincket switdnztg, data must 
bo buffered at e veiy rooter before they am sent on- B^ncft p^r ^ 
liave a limited amoont of baffetSi^iI^ accept data oifly ivtai tfaay 
haeve enough space to store the incondng data. 

llieio an three types of flov oonirol, namely item on^^n^^ 
vjyni0f cnTHAn^ajsft. and wemMff xoudn& in store-and-f (ftwaxvl 
touting^ an padmt la cec^ved and stovedih its e^ 
ir b f orwaflled to the next router. Tbia feQttiiea storage for the com- 
pifitepad(et, and Implies a par-ronter latency of at least Oedme 
required fiir the rooter to receive die paclDBt. 

In viitiaj cnt-dnongb routing a packet is ftatwanM as aoon as 
the next rouier gvaranteea d>at the completB paekec will be ao- 
o^ted* Only who? jiQ guarantee is given, the wboZe paeket is stoxed 
intfaexomen Unis, virnjal cqt-iroaigh niuting reqniiea bnffcr space 
for a oomple te paclcet. Uke store and ibrward renting, bnt allows 
lomer^latenejr commimicaiion* 

In wonnhole renting packeta ate split in ^o^called^ (flow con* 
trol digiis). A flii is passed to do tejsi ravxtr when' that ronter 
accepts thai flit, even when ttiaeis not enough buffisrspaee for iho 
eoax^Iete paelost As soon aa a flit of a poi^ ia sent over an om^ 
poit diet oat|in£ port iB teaacved to flits of tbat paote oiiQik Wh^ 
die flist flft of a paefaet is UooibBd the taaiUBg flfis can dierefine 
he spEod over multiple /outers, bloddng dm intaanedbte Hnka. 
Wonnhole xontins toqurea die least bnSoing (bute ^ instead 
of pacKeta) and also aUowa low-^laiency caxmmmication- Hbwavei; 
& ia mozie sensitive to deadlodc and g^ienlly teaUltB in Iowbt link 
uiSlizatiaa than virtual cnt-4hxoQgh rooting. 

We C!pifi>r woonholexcutxngbscanfie it oiBbcs low latency, wbifili 
is CBie of onr tBtgeted Qcryioes, and becanse it has dm lowi^ 
tenna Of bnfCsriiis, whiclL Is expensive on-ddpu 

4. A(X>MBI]>rEDGT-BEROlJEE^ 

Section 2 defines our lequirements fbr NoCa in terms of services 
OiatarBtobeofiitted, in partloflar; both guaranteed and best^efifbit 
services. Useproylotts secdon introduces a numberbf gcoocal net- 
wDddng issues timtwOI be builc upon ham. In die following two 
snbseedons we show mar die gnamnteed and beat«Shrt services 
• can cmieftnnwlly be described by two faafapandeBt mntei' mA^^ 

tuTBs. The ooinbinadan of these two rooter oreihitectures la offi- 

dent and has a flexible pregnnnniing inodeiU as dasoflied la Snb> 
seetxon43. 

44 A GT Router Ardiitectare 

Our gnaranteed-lhrongbpnt (gt) router mmt guarantee onepxw 
ittpied^ lossless and ordered data nunsfer; and both ttmn^^t and 
latency over a flnite time intervaL As ^rt4^\mt^ earlier, data in- 
tegdty is solved at the data^Jiqklj^ we do not address it fe^^ 
Nod^ isdn^ppedhy die cTiemerbeeBase we nse a variant of dr^ 
ouit swicefahig (desccfbedin flw next secdon). Daiaia tranqwrted 
in fixed-size blo^ further explained below. As o^ one block 
iBStored pBriiipntinilie.OT romei^blodEStemafo^ordBre^ We 
now hun to ihft mom chulimgin g Hmerfidftterf guiii^n pU ^ n m Af 
duousbpnc sndlatBDcy, 

4.1.1 Ttme^relafed Guarantees 

Latency is defined ^ the time a packet spends in the necwodc 
Qnarnmeeiz^ Jlatem^, theretbre, means that a worst<<case upper 
bound must be given for this time. Here we define dnou^mi 
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far a giveo ptDducer-ccmsusi^ pair as che omomit of daia reins- 
ported by thoBfitwoik overafiiitfl* fixed tiinointemiL Guaiaqtte- 
iag thrtJUghjmt means giviag a lower bound. 
■ Wo observe that guamtcijing latency in a lossless lou^ 
ficult because ctrntemion leqoires schcdnliag and hence delays. 
GtwnffitedBg thwu^iinit is less pmldemo^^ Rate-based paclcet 
switching (for aa overview see [13D oSets guaiajileed diroii^hpnt 
ovQrafiait9penod,^dhexic6alat^icy bonnd. 
higb. however, and tlie cost of buffering is elfio Mgb. Deadline- 
based packet switoliin^ [13] offers preferential xraaljoeitt for paifc 
ets close fa their deadUqe. Thitt allows diffexsniial latent giuranr 
tecs (imder certaiaadnilsaibld liaffic assiUD^^ 
bufffer COSES. ^ 

CHrcoit switehing solves ttto ooniefltion at set up, so natoiuly 
mWdiQg gnaramccd latency and UnroghpOL CinsijtB can ba 
Uv f^ ID iupitfVB ttsDOaam [5], ar the cost of ad(£iional 
bufibitogandlatenqr. Hmo-dMsioD multlplesdng conaeedons over 
nSpd&ttddicmCB itddidoaally olfera flexibiUty in bandwidth allo- 
cation. Ms leqnixes a nodon of router $yncim>iilelty, wb^ 
db!e becaofiB a NoC is betuer cenrfolbhle dion a general setwodc 
Weoaqplaia to variadoninmoiftdetaa in tbencatta^^ 
assi>ciated progf&xnndng modd 15 described in 5e^ 

4wi^ Contentioj>fte3 Routins 

A rooter uses a shf taUe to (a) avoid coatentfen oa a Ki&, <b) 
divide up bandwidm per lin!«; and (c) switch data to the cowect oui- 
poL Every slot table Jlbofi 5 fibtfid.sJzeiSJne dots (rows), and JV 
taut»on^uts<coihoan8X TheiB is a logical notion of syncfaonlD- 
iiyj aUtontasinihenfltwQjkaieinthesaraestot&i^ ^ 
inost one btocJt oTdato can be read/flwita per ii^ompnt poit^ 
Bfixt dot (fl+l)»ff. fliBieadblocfc am written to thar ^iwe^ 
oHtmjt ports. BlMfa thus presto in a stofe and fiorwaifl^ 
Tl»l8cency a Wookinctos par router is equal to die dntBJto - 
dOL Bandwidth is fiOaranteedtowuWtOes of blocks^ 

The entries of Cto si ot table map outputs to inputs for ovary slot: 
R(b o) s=i. An antiy is csipty* whao tbere is no icscwtion for 
that OLttjm in that slot No contentionarifiesbecausB 
one input per output Sendbtt a ainglo inptt to aml^ ouqaita 

(niuIticasiD is possible. 
Tire slots icseryed tor ablod: qlongitapafli ftnmsoiiteetodostt- 

Dalian increase by CH» * ^ ^ 

slot (6 + 1)%S must be aacrvBd in the nett rouftsr on die path. 

the aasifinmem of laotB n eonnecdons fte netwo* fa an opfl- 

(ssplains bow dots aiB reserved in the newiroii; by meaM 
e£Gcat packets. 

4.2 ABERottterArdutectare 

6es&«froit CBB) tiaffiq can hav^ a better ttvera^e per&nn^jce 
tban offered by guaranteed Sttvioes. -XiiiB depends on bonwlaiy 
conditions, such as netwodc toad, thai are unpredictably Best- 
effort services thus fblflU our dficiency requirement, but w^ 
ofi^ tima-relatedfinaraBteea. This secdon dascribes an artihi- 
tEcture fisra best-^ort service wth uncomipted, lossless, in-order 

^^^K^^cfficiencyisiafltifincedbybothilsconj^ 
its utijizadon. In Sccdcm 3 webffW Josiififid oar choice for roufc- 
ing (source routing) and nelwoifc flow conttol (wonnhole). Now 
^ dB ffiTr*''^ ^^QTrffflitSmi leseiutfon sdhome dm is used. Ir has 
two eonmonflnts: huffciiog and scheduling. Our rouier piototj^ 
show that the buffeiiijgeosta dominate die cost of the roniar, Tht 
mainttadB off in Sectxon4.2,l is thctcfbpe be ween buffer coste and 
link tttUizalion, Which are bodi critical tescttroes. Rirthe chosen 
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hulpsriog strategy an elBeient 
Section trading oftUhk otOizaiian and 



in 

eompilexity. 



42.1 Buffering Strategy 

Tim bnlteiiDg strata deisrmines the location of buffers inside 
ihe xoulet^ We distinguish input queuing* output qaeuins* Afid vzr- 
tffOl output queuing, in the following. Nt&fho number of inpnta 
(equal lo die anmber of outputs) of therouton Web^ve Diacin 
a balanced sdution the rates at wbidi routers and finks opetate ia 
eipial Slower xouteia re^(uirB moxe bni&iing, and fiuder nmtefs aro 
not feasible as links operate at high speed. 

fiiinpnrqiieaing^ie is a sis^ queue per input, resuliingin 
tba lowest buffer cost (ff* logical queues in JV* itg^ie^ 
of fin fbree ^ppioadlBS. H^wevo^ dtaa to the so-called head-^^ 
line VlockhS* f at laige N mswozk udlizfition saturates at S9 A [S]. 
ThsfefyKf id^ qnaning results in weak utOiaation of the links, 

Omput queuing can increase die link utilization to 100% by hav- 
ing N queues a^ea^ output, or ^ C3ueues« ^tb asmony plQrsical 
memoxieS' It is better to hatra fewer larger, memodes than mora 
smaller memones because die overiiBad of small RAMs is very 
high. Overclotjcui(g the romer by a factor j>r to uaoi^^ memories 
ianetpossihla^asaigaedpieviously. So Uiennmberof memorlea 
depmds qoadraticany on M benoe ou^ queoing is not coalflh^ 

^^nual output queuing [1] <V0Q> cotnUnes the advantages of 
inoit qnening and onqnit queuing. II has the bqilbiiQg com^^ 
of hqnit queuing and the link udlizatiiip of ompos queqlng. As fo^ 




in JVtfl^yslGalmemotieiastheiipuaasforinpntqu^^ Bar 
every inpm » ih» are iV queues QW, o), ona fioreacb output o, see 
B|gaiBa.Thereiaatiiiostonewritetodiesequeues. TIjoditoence 
between output and voQ is the additional consttainc dxat tbexe can 
be at most one read fiomdris group of W^eues. CUns enabled the 
-njappiiig of an iq?ai queues of the same input to ona nianioiy^ TWs_ 
optional constiBim has 10 be taken bno account lor the sebedidi^ 
100% link utilization can stil] be achieved, when ^ is larjso [12]. 

We select VOQ beea&sa it oomlbines Snk ntiUzatlnn wiPl 
xmidexate buflbr costs. 

4Z2 liiSairix Scheduling 

lUs seciioDEbows how link oaitention and memory contenti^ 
(^nposed by voQ) are resolved. MMt adiedoluig solves bodi 
kimb of conienidOD by ensuring ffiat eVBiy VOQ rnemenr ia read at 
most once, and evaxy output Oink) is wiiiten to at most once. The 
5c?iP«nyling pioblom can be moddcd as a bipartite gn^ matchins 
pioblem as IbUows. Every input. port i Is modeledby anode \h and 
evBiy output port o by a node «D. Tharo is an edge bttWBsn iH anfl 
vo if and only if queue ^(»,o) is nonrempty. A mofebis a subset 
of diese e<^ such that every node is incldsnt to at most Ona edgOi 
Per exatt^e, Eiguie 3(o> is a matohof RgwB 3(i^). Ttenmnbtt of 
edges in dxB matdi is its a match is masdhi^ 
can be added to it. A nioxdnim «iz» n»acA U a fa^ 

jm])0Hgh optimal, dusie are cwo mascma not to conslda' omy 



lEIgore 27 Scbematlo of a ronieir virtual otil^t quenSng. 
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!l?Igiir6 3: The Ihtee ?teps of a sSngle iSLiP it^^ 

maxuDnm $120 xualdbfis. Rrst^ maxtaim slza ^"'^hfng filgoiiUuns 
have 0(JV^^^) compleai^. Smcosifitilxsdi^SulinglsdQneatffit 
iai&thisUxiotfeftsifa!e^laQ3ei\r. Second;^ 
Sng algoiSUuDis can be pnfidr, wUdi cm molt in snradoD, ie^ 
some qpene^ aiB never saved. 

Thm aie setranl iqatGliiQg idlgoiiilttnft; sea D 1| for a thomii^ 
disetntfen. Ve£eiepttheiiefat^wSIJP(tSL!F)nunixsc^^ 
,fl|goriflmi PU, Iseeaose it has a low comiilsiaQ^. avoids starvaiioa. 
fad pxo^^das koeai&ig perfDnoance as ^ nuniber of itemSoAS 
fitovs. fi:ieafl]iesftzDiixiiQ2anifltdLiDlog2(iV)it^^ £vena 
single iteialioiiCDiisiden^ outperfonns ii^nt qoeiung, and am be 
effidesify inqplenfKdited in liardwm Miilliide iferadons xnqceasp 
th& latGacy of thd omlTDl patb, ttid hem the ffit ^ (as expl^^ 
in 5Seci£on43.1X W& oosfiidBr using 1-SLIP becaose muh^ iter'- 
ationa glvaoBlymaginali i iip wftm eaiL 

AoR^iSIJPitaniiianhaa tfaroe steps, illiistratadby an exan^ 
hiKgrnS for iV b 4. In tbefiifit stag^ see Hgure 3(30* eveiy noo' 
enopiy qpeoe 0} re^nerar aocess to wttpat port o from input 
pM im In die secood so^b^ see lE^ffxn 3(b)v enxy output poxt o 
jrcmts one seqoei^solvii^lfadccoitteiiiicm at the €^ In 
tbeihsd stage, «6eH^;ttio 3(e), eveiy hqnit poic f 4sm^ mis giant, 
to iesolve nuemoty contrition at 1b6 h^iit poit. We extend ISUP 
to take nciwoik flow control into acoonot 

43 Combining flieGT and BE Ronters 

Tt» OT and B6 looter arc]iitectnce» arc combined 10 share re- 
donrces, in patticQlar the Jink^ memories, and switcbea. Moieover. 
best-effOEt tiaffic enables a packet-based prograTrming model for 
tbe ginaiaaateed tzatSe, as flhottral^ter, in Section 4«3.2. 

Thdprfafl^lcopstirtntforacniabinediOtttBraitiii^^ 
£ttaianieed sarviees tttt asm aQeeted hy besl-^^ 
ure 4<a) sbovs fiiat, copceptuaDy, the condMned zouier contains 
boQt ioaitf ftfdhifechiies (fiit liaes rqKesBQt data tfaiiSpuit» thto 
lines igpresentcontn^tgaro^ort), Tncomfng data is switched to ei» 
t&er^ar ortfae BE mtet ll^GT traffic (Oetxaffio that is secved 
by the gt loatei) has &e higber priority, to inaintain gnaramees. 
Tbii is enfimedby the aibitiatioD otiit, which thBze{b;e afiSscta the 
bcst^cfibrt schedslin& Fnrtornore, besi-oflto pacJoets can pro^ 
giaio the goaiaoceed louter, as shown the aaom/ labeled pro- 
jp'ftiYt- Thin liqes going fiom die Wg^^ to Qie Idt ttxtirw^tip. netwoiic 
flow contzDl, whfcli is only leqnired for i)e8tpeCrartpac3BU 
goaranteed blocks never flncmintPT oomentfnn. 

On a shared Unkonlyome BBor Gt dataitemoflaaiifvieiirbB 
sentatany pcdnsindme. thwot ttfldBBinettoficaoante 
lBU!fSn$ dieimxBber of memoxiss ^tl^, vidttlf+N^ bgifiai queues 
m total Rignre4Cb) showa tbat the Attft|)athcansisfing of memo- 
lies wdswitchmacnxis sbared. and that (be control paths of the sb 
and GXrouieia aia s^iaiaie^ yet intsnelate^ Moceover, the arbiba- 
tioni]QitofFiguce4<iOhasbeenabsQibedby theBBreaten The 
fbnowifigsttbsectiaa shows how dii^ can be don& 

4L3.I Arbitration and Flit Size 
When combimng gt and be traffic in a sin^ nefwoik the, in)- 
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pact on the netwoilc flow control scheme nmst ba taHosa inlb ao- - 
oonnt. Recan from Section 3.4 ^at a BB flit is the smallest unit 
whiohflowcontiol ispearfonned. In other wozds, the BBSGfaafa]lin& 
nslngiSUP, coo only react to CT hlfidcs at ^gcomlarity. tb avoid 
aHgnmont pcoUems^ the blm^ size (P woids) is a nail^pl^ 
flits words, withB ^is constant; weptefcr a smaH i and 
F to decrease the fltore^and-lbr^riud delay for gtazanEeed iia£^ 

We eo^tend iSUP to handle the comhinatioai of OT and BBlxaiSa 
In ais con^bzi»t£on GT traffic always has prionty over BB tiaf^ 
Ihis is to ensiKe that gnaxantees aie nfiwsr comipijed. 

4^.2 Programmins Model 

In this section we show how GtT connectiQas azeset up and torn 
down by means of B^paclo^ To ensnre scalabiliiy, piogramniiiig 
most not ic^nitB gJoM or centralized resoutoos. Section 4.1^ ex- 
plains why our oontemionpfiea routing uses slot tablet; we now see 
thiB Ai^ am distiibnted over nmteis fbr sealabiltty. 

loitiflJIy the slot table ^tfeveiyioimv is empty. .Titfsmesnafliat 
GT onmifffitiftns can only be set tjstog BB pa<3oets, tmless an ad^ 
attoad caminnnifaflrtn inftaslnictttm ia famodiicea solely for pio- 
gisnmiing, lyyo fecial padceis^ l^t and Start ans nsed 
and suit the NoC, re^peatve^. Tbey pipgress by flooding, and are 
not snbjectto thensnal uetwoxk flow conirfd. We will not discqss 
than flirther. There are three system packets: 8etUp» TaarDown, 
BQdAdcSstUp. They are used to pro-am the dot table in every 
votttaroniheirpaih. 

The S9tU)> padoet is used to eieafie a connectton ftom a source 
to a destmaiiant and trovds hi the direction of flic data (^wn- 
sneam"). AdtSstyp aolmowled^ a soccessful set up, and flows 
npstteam. Tbo IharlToyvn padtiBi destroys ^tftially) existing ieon- 
nections, and can travel in eidiar dhectSon. Setup packets contaht 
flie soarca of the data, the path to tbeb *^ttm^*«', and a slot'xmm* 
bar. BvBiy ronter along die path of the SetUp paclBt obedcs if the 
oo^ to flieiieKtioatBr in the pafliiaixce in the sloe indicated by • 
the pac3»t. If it is fiee» the output is leserved in ttet slot, and d)e 
Setup pacitet is forwarded with an incnemented (modulo 5) slot, 
Othowise, die SetUp pat^ is discarded and a TearDown paeleet 
letoros along the samopaCh. Thus evety path must ha reversible; 
this is the only assumptioQ we make about the netwock topology. 
These upstnsam TearOown packeia the slot, and oonthnie with 
a dea«men£ed sloL Downafifeam TearOown packets work simiUidy, 
and lamovB espisthig comiGCtlaDs* A connee^QB is attccBssfolly cre- 
ated when an AckStfUj) is isceived, titee a JearDflMrn is received. 

The prognunmiag model is p^elhffid and .amcDnent Cnultipile 
system packets can be active in the netwoik simnltaaemu)tyk aim 
firam Che same sotixoe) and distribated (aotivo in minli^ 
Given the disbdbuced nature of the programming nodeL ensuring 
ooflsisten&y and determinism is cnidaL The ooioomo of program-i 
mittg may depend on the execution order of system packiptSv hot is 
always coDjistBint TlifinflKS seetica diows how 10 iBO flie pc 
mbig modsl* 



^^HNt(ttl031BPP 051 08.10.2002 18:21 



433 Slot Allocation 

This seefioa expljains ivays to detenmoa the dots spedfied in 
SstUppadKts. A slot allOCTh'na for a singto conntctiog yegmifes 
lluc, n 0VB7 tomer abnig tiie path, the jeqtl^ 
qipiiqpz!ate slot Therefore, intt^erence of SdtUp packets of mul- 
tiplo comzeclians can be coxopletely avdded if comi^ctiQiis arfi sei 
Up with cfKi^icNfia»8 slots or patha. All execution oidets of SdlUp 
padffits (3iM giVQ th£ some resulL 

Coinpuiiiig an optimal ^ alloeailoni? complex and icqiiires a 
^tibsl network view. Ijt gm be used only for scoHU problem in* 
5t2ztce$. T9 itduce compntfitionail cost, iL^nistios can be used, bnt 
tin? probably leads to non-opdma] ^Ituknu. OoinpOe-tixxie slot al- 
locations liom boDi ^}pioadies can bo isoeated detemnmstically 
at nm tmm, ccncnneiuly and dUnSbiitftd^ (becaoso all fidtUp pack- 
ets aie confiletrfpedX 

Ax nm tinie» a filobal view reqidzea a ccQire^^ 
TMs inpoifs scalabUiiy and slows cJofOFn piDgiamiDing. fom-iima 
disoibDied ^ allocation is scalablei but lacks a global ^i^. This 
tyficOSy iBSults in fluboptoal anocadon. Moreovei; Setup 
padEfits nay imasfeBB, va/Oang prograiwnhxg mon invoMd. and 
pediapft noD-detenninisttc. Hofpevcr, dynamic oonneednn maa- 
ascment at Id&i tataa will legidio ^tzibmed slot sUooatlon. In 
a ^Dlte di^tiibated gieedy algorithm, aU wsi^ 
crate random doc nuznbecs tot each set 19 until their connection 
succeeds. 

We oonclode diat our prosramming model aHows both compile- 
time and nb^timo slot allocaiioiu Computatioaal camplexxty. de* 
tenmnistio results » and scalabilify can be balanoeri accoidinft to sys- 
taareQUte B P flnt s. 

5. CONCLtlSIONS 

Managing the ccroplcadty of desigrfna cMps conlafarins bflHcms 
of transistQirs leqoires decoupling compwatioii fiom coGmmmdca* 
don. Blor oomxnimieatUiqt netmds on dd^ (NoC) an ameighiB 
aa an i^lenmivo forcodjsiiRg ittseiooiuiecin u> solve tedlnudo^ca]. 
performance, and Gcalahiliiy problems^ 

In this paper w& show that guaranteed flcxvicss bio essential to 
provide prodfntahlft inteaccommcts that enable compositional system 
design and integmdon. However, guarantees typically utilize r&- 
saaxce^ inelSdently. Best*e£toit fiervices overcome this ptnblem 
bat provide no guamatees. So, combii^ guaranteed and best- 
effort setViODd allows e^dent icsoutob utilizadoDi yet still piovid- 
ios gnarantees for critical trafilc. 

Time-nJated goatantees, such as duougfapuc and latency, can 
enfy be oonsoiieted on aNoC that xotrinsicaUy has di^ 
Wo ffaarefbin defion a rafqCef^MsedNoC BRdinifiC^ 
guamniced and best-efEoit services. Thus, the romer anbitectnio 
has conceptually two parts; the guaranteed tlwougbput (CT) and 
besT-effoxt <$e> ranters. Both offer data intesiity, lossless data do- 
livtixy, and in^rder data dcUvery. Additionally, &e orrouiar o££bi? 
gnszanteed duougfl^^t and latency $ervices using pipduial circuit 
switofadng with dme-ifiviGioa multiplexinS' This require^ a notion 
of syncfaronicity: at each time slot at most one block of datais com- 
mmdoated over alitiEL TI10 gt rooter has low latent and modo 
azo memory requirements. The be muter uses pae&ei awitching, 
wormhole routing; and virmal output qoening with iSLlP. The BB 
roptef has low latienqy. high link p^ilizatiQn, and nrnrtprom memory 
reqni i ftmcnts. 

We confite tbo GT and BB rouier archxtecmres ^cSsnify by 
shaxmg router resourees. Hie gparanteos are never ofibeted fay me 
Bfi trafific, and links are efficiently udlisxd becausoBB traffic nsea 
all bandwidth left over fay CT traflic Gormecdona ata ptogramned 



using BE padeeta, Iho programming inodel iSknhust;^^ 

and distributed. It enables run-tiine and oompn&ta^%ffinhd&. 

tic and adi^dvd cannection managemeinE. 

For all our amhitectoro ohoices, wo^w the trade <sSs betwieen 
hardware complexity and e£Scien^, and motivato our choices. 

In conclusion, ws describe and sootivBte a combined gnorantced 
and best-eficit rooter, whidi is an essemial component in a NoC. 
ti fulfills oar requhttments by prwicfing goatanxeed services, and 
satisfies die eSiciency constraint by good resoomdufjliviiiOD. 
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1» lategrated dxcuit coxnprisixig a plurality of module, and a network atranged 
for trfinsfercing messages betiveen the modules, wherein a message issued by a 
module comprises first infbimatlon indicative for a location of an addressed module 
wifbin the netwoik, and second iajbmiation indicative &t a looation vritbin tbe 
addressed module, 

chaxactedzed in that the fust and the second infimnation are arranged as a single 
address fiom which the network detenoines which module is addressed, and fiom 
ii^cih the addiessedmodule detfflnines wUch of its looations 

2» Method &r ^dianging messages in an integrated circuit coo^rising a 
phuality of modules, the messages between the modules being exchanged via a 
network^ wherein a message issued by a module comprises fitst infonnation indicative 
for a lockion of an addressed module within &e network, and second infom^ 
indicative fi>r a location wifliin the addressed module, 

characterized in that the first and the second information are arranged as a single 
address &>m ix^gIl the network detemiines wMch module is addre^^ 
which the addressed module deteanines which of its locations is selected 

3 . Integrated dieuit comprising a plurality of processing modules and a network 
ananged ^ providing at least one communication between a first and a second 
mdule, which communication channel supports transactions conqirising outgoing . 
messages from the first module to the second module and return messages fiom the ' 
second module to the first modide, duffacteiized in that the net^ 

outgDiiig noessages la a way difi&ient firom Oe return messages. 

4. M^od for exchanging messages in an integrated circuit comprising a 
plurality of modulessi the messages between the modules being exchanged via a > 
network, wh^cem a communication channel througji the network supports transactibns 
comprising outgoing messages fiom the first module to the second module and return 
messages fiom the second module to the first module diaracteti^ed in that 
network manages the outgoing messs^ in a way different fivun the return messages. 

5. Integrated circuit according to claim 3» i;(^erem the network has a first mode 
wherein a message is trans&nedwitfiin a guaranteed time interval, and a second 
mode wherein a message is transferred as fbst as possible with the available resources, 
wherein the outgoing transaction is a read message, requesting the second module to 
send data to the first module, wherein the return transaction is the data generated by 
the second module upon this request, and wherein Oie outgoing transaction is 
transferred according to the second mode, and the return transaction is transferred 
accoiduig to the first mode. 
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6. Integrated drcmtaccordiDg to claim 3 

two of the followkig tcansactioii modes unordeTed, looally ordered and globally 
cnrdered, w!hereiii an unoidered ttansactiott mode of the netwoilc gives no guarantees 

the order in which messe^es wQl auive at their destination, a locally ordered 
transaction mode guarantees that messages sent to the same destination will auive in 
the same order as they were sent, a global ordered transaction mode guarantees that 
messages will arrive in the same order as they were sent evenif they are sent to 
different destinations, wherein ontgonig and rdnm transactions are handled according 
to <iifferen t transaction modes, 

7. Integrated circuit according to claim 3i wh^ein the netwoik reserves a £zst 
and a second buffer space fox the first and the second module respectivelyj flie 
bufferspaces having a nmtuaUy different size, 

8i Integrated drcuit comtirising a plurality of modnles> which modules are 
arranged to communicate to each other via a netwoik, wherein tihe network is 
arranged to distribute a message firom a first module to two or more second modules^ 
and wherein die second modules are arranged to generate an acbxiowledge message 
mdicatmg receipt of the message fiom the first module^ 

the network b eing arranged to generate a single return message to the first module^ in 
dependence of die admowledge messages of the second modules. 

9.. Integrated circuit accordmg to claim S, wherein tihe single return message 
incficates that at least one of the second modides has recei^ 
the first module. 

10. Integrated circuit according to claim 8» wherein the single return message 
indicates that each of the second modules has received the message issued by ifae first 
module. 

11. Integrated circmtcomprismg a fir^ 

networks the netwoik con^rising a second plurality of nodes and interconnections 
between nodes^ &e network being arranged fbr transferring messages between a first 
and a second modules via a path fluroug^ the netwod^ the processing modules coupled 
to the network via a netwoik interne ha^g a buff^ fbr receiving incoming 
messages* wherein a message fix>m a first to a second module is not initiated until the 
buffer has su£5dent q[>ace for receiving a letum message fiom the second module. 
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>fet9vor]s are emerging as a fu^sible solution &r onr 
istercoimeees. this papet, we desczibe how net* 
wodcs en (NoC) are similar to and daSsr from 
both off-chip netwotks (&$n oomputdr netwoilcs) and cur- 
zenfc on'^hip infeicomiecta (e.g.y Inisea). We re-eocmnine 
• the catmxaaieaSiain. ssrvicas in flie conttssQE of KoCo. We 
provide services thqt abstract &om networic implemfinta- 
tiona enabling a clean sq)azacian betiveen the NoC and 
IP blocks. We define a lequost-xesponse transaction model 
sisQilar to bos prococols, maldng our ^pxoach back- 
ward coxnpatifalfi, To exploit the fiill power of NoCs» 
"we also provide connectioii-oiieDted communi cation with 
differentiated services. Examplea are bandwidth goarsn- 
tees^ transaction (nderings, and end-to-end flow controL 

KByfPbrds: Netw(^ on chip, on-dhip buses, conqnser 
networks, conrmunication services^ protocol stads^ 
transaction, oonneotion* 
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